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Fig. 1 - Sequence information for C-LytA. 

SEQ ID NO:1 - amino acid sequence of C-LytA repeat 1 

GWQKNDTGYWYVHSD 15 

SEQ ID NO:2 - amino acid sequence of C-LytA repeat 2 

QSYPKDKFEKING TWYY FDSS 21 

SEQ ID NO:3 - amino acid sequence of C-LytA repeat 3 

GYMLADRWRKHTDGNWYWFDNS 22 

SEQ ID NO:4 - amino acid sequence of C-LytA repeat 4 

GEMATGWKKIADKWYYFNEB 20 

SEQ ID NO:5 - amino acid sequence of C-LytA repeat 5 

QAMKTGWVKYKDTWYYLDAKE 21 

SEQ ID NO:6 - amino acid sequence of C-LytA repeat 6 

GAMV S NAF I QS ADGTGWYYLK PD 23 

SEQ ID NO:7 - amino acid sequence of C-LytA cholin-binding domain 

GWQKNDTGYW YVHSDGSYPK DKFEKINGTW YYFDSSGYML ADRWRKHTDG NWYWFDNSGE 60 
MATGWKKIAD KWYYFNEEGA MKTGWVKYKD TWYYLDAKEG AMVSNAFIQS ADGTGWYYLK 120 
PDGTLADRPE FTVEPDGLIT VK 142 

SEQ ID NO:8 - amino acid sequence of C-LytA domain from truncated repeat 1 to repeat 
6 (as part of our constructs shown in figure 2) 

YVHSDGSYPKDKFEKINGTWYYFDSSGYMLADRWRKHTDGNWYWFD^ 
GWVKYKDTWYYLDAKEGAMVSNAFIQSADGTGWYYLKPD 

SEQ ID NO:9 - DNA sequence encoding the amino acid sequence of SEQ ID NO:1 
ggctggcaga agaatgacac tggctactgg tacgtacatt cagac 

SEQ ID NO:10 - DNA sequence encoding the amino acid sequence of SEQ ID NO:2 
ggctcttatc caaaagacaa gtttgagaaa atcaatggca cttggtacta ctttgacagt tea 
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SEQ ID NO:1 1 - DNA sequence encoding the amino acid sequence of SEQ ID NO:3 
ggctatatgc ttgcagaccg ctggaggaag cacacagacg gcaactggta ctggttcgac aactca 

SEQ ID NO:12 - DNA sequence encoding the amino acid sequence of SEQ ID NO:4 



SEQ ID NO:13 - DNA sequence encoding the amino acid sequence of SEQ ID NO:5 
Ggtgccatga agacaggctg ggtcaagtac aaggacactt ggtactactt agacgctaaa gaa 

SEQ ID NO: 14 - DNA sequence encoding the amino acid sequence of SEQ ID NO:6 
Ggcgccatgg tatcaaatgc ctttatccag tcagcggacg gaacaggctg gtactacctc 
aaaccagac 

SEQ ID NO: 15 - DNA sequence encoding the amino acid sequence of SEQ ID NO:7 
ggctggcaga agaatgacac tggctactgg tacgtacatt cagacggctc ttatccaaaa 60 
gacaagtttg agaaaatcaa tggcacttgg tactactttg acagttcagg ctatatgctt 120 
gcagaccgct ggaggaagca cacagacggc aactggtact ggttcgacaa ctcaggcgaa 180 
atggctacag gctggaagaa aatcgctgat aagtggtact atttcaacga agaaggtgcc 240 
atgaagacag gctgggtcaa gtacaaggac acttggtact acttagacgc taaagaaggc 3 00 
gccatggtat caaatgcctt tatccagtca gcggacggaa caggctggta ctacctcaaa 360 
ccagacggaa cactggcaga caggccagaa ttcacagtag agccagatgg cttgattaca 420 
gtaaaataa 429 

SEQ ID NO: 16 - DNA sequence encoding the amino acid sequence of SEQ ID NO:8 

T ACGT ACATTCCG ACGG CT CTTATC C AAAAGAC AAGTTTG AGAAAATC AATGG C ACTTGGT ACT ACTTTG ACA 

GTTC AGG CTATATGCTTG C AG ACCGCTGG AGG AAG C AC AC AG ACGGC AAC TGGTACTGGTTCGACAACTC AGG 
CGAAATGGCTACAGGCTGGAAGAAAATCGCTGATAAGTGGTACTATTTCAACGAAGAAGGTGCCATGAAGACA 
GGC TGGGTC AAGTAC AAGG ACACTTGGTAC TACTTAG ACG CTAAAG AAGG CGCCATGGT ATCAAATG C CT TTA 
TCCAGTCAGCGGACGG AAC AGGCTGGTACTACCTC AAACCAGAC 



ggcgaaatgg ctacaggctg gaagaaaatc gctgataagt ggtactattt caacgaagaa 
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FIG. 2. CPC and native Constructs 

Construct 1 - coding sequence of CPC-PSOI^Wsee plasmid of figure 7 -Y1796) 
Protein sequence (SEQ ID NO:27) 

Rl R2 R3 R4 

MAAA fY^DGSYPKDKFEKlNGTWYYFDSSGYMLADRWR^ 

R5 P2 R6 

IWKKIADKWYFNEEGAMKTGWVKYKDTWYYLDAKEGAlMOYIK^ 
lADGTGWYYLCTDj GTLAPRPEKFMYM^ 

GILLS LFLIPRAGWLAGLLCPDPRPLELALLILGVGLLDFCGQVCFTPLEALLSDLFRDPDHCRQAYSV 

YAFMISLGGCLGYLLPAIDWDTSALAPYLGTQEECLFGLLTLIFLTCVAATLLVAEEAALGPTEPAEG 

I^APSI^PHCCPCRARLAFRNLGALLPRLHQLCCRMPRTLRRLFVAELCSWMALMTF^ 

GLYQGWRAEPGTEARRHYDEGVRMGSLGLFLQCAISLWSLVMDRLVQRFGTRAVY^ 

AAGATCLSHSVAVVTASAALTGFTFSALQILPYTLASLYHREKQWLPKYRGDTGGASSEDSLMTSF 

LPGPKPGAPFPNGHVGAGGSGLLPPPPALCGASACDVSVRWVGEPTEARVVPGRGICLDLAILDSAF 

LI^QVAPSLFMGSIVQLSQSVTAYMVSAAGLGLVAIYFATQVVTOKSDIAKYSAGGHH 

R1 (plain): aa5-9 (fragment) R4 (bold): aa53-72 P2 (underline): 97-1 10 

R2 (bold): aa 10-30 R5 (plain): aa73-93 

R3 (plain): aa31-52 R6a (bold): aa94-95 R6b (bold): 113-133 

Nucleotide sequence (SEQ ID NO:28) 

ATGgcggCCgctTACGTACATTCCGACGGCTCTTATCCAAAAGACAAGTTTGAGAAAATCAATGGCACTTGGT 
ACTACTTTGACAGTTCAGGCTATATGCTTGCAGACCGCTGGAGGAAGCACACAGACGGCAACTGGTACTGGTT 
CGACAACTCAGGCGAAATGGCTACAGGCTGGAAGAAAATCGCTGATAAGTGGTACTATTTCAACGAAGAAGGT 
GCC ATG AAG ACAGGCTGGGTC AAGT ACAAGGACACTTGGTACT ACTTAG ACG CT AAAG AAGGCG C C a t g caj|£ 
acatcaaaactaactctaaqttcattqqtatcactqaa qqcqtcATGGTATCAAATGCCTTTATCCAQTCAQC 
GGACGGAACAGGCTGGTACTACCTCAAACCAGACGGAACACTGGCAGACAGGCCAGAAaagttcatgtaCatg 
GTGCTGGGCATTGGTCCAGTGCTGGGCCTGGTCTGTGTCCCGCTCCTAGGCTCAGCCAGTGACCACTGGCGTG 
GACGCTATGGCCGCCGCCGGCCCTTCATCTGGGCACTGTCCTTGGGCATCCTGCTGAGCCTCTTTCTCATCCC 
AAGGGCCGGCTGGCTAG(^GGGCTGCTGTGCCCGGATCCCAGGCCCCTGGAGCTGGCACTGCTCATCCTGGGC 
GTGGGGCTGCTGGACTTCTGTGGCCAGGTGTGCTTCACTCCACTGGAGGCCCTGCTCTCTGACCTCTTCCGGG 
ACCCGGACCACTGTCGCCAGGCCTACTCTGTCTATGCCTTCATGATCAGTCTTGGGGGCTGCCTGGGCTACCT 
CCTGCCTGCCATTGACTGGGACACCAGTGCCCTGGCCCCCTACCTGGGCACCCAGGAGGAGTGCCTCTTTGGC 
CTGCTCACCCTCATCTTCCTCACCTGCGTAGCAGCCACACTGCTGGTGGCTGAGGAGGCAGCGCTGGGCCCCA 
CCGAGCCAGCAGAAGGGCTGTCGGCCCCCTCCTTGTCGCCCCACTGCTGTCCATGCCGGGCCCGCTTGGCTTT 
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CCGGAACCTGGGCGCCCTGCTTCCCCGGCTGCACCAGCTGTGCTGCCGCATGCCCCGCACCCTGCGCCGGCTC 
TTCGTGGCTGAGCTGTGCAGCTGGATGGCACTCATGACCTTCACGCTGTTTTACACGGATTTCGTGGGCGAGG 
GGCTGTACCAGGGCGTGCCCAGAGCTGAGCCGGGCACCGAGGCCCGGAGACACTATGATGAAGGCGTTCGGAT 
GGGCAGCCTGGGGCTGTTCCTGCAGTGCGCCATCTCCCTGGTCTTCTCTCTGGTCATGGACCGGCTGGTGCAG 
CGATTCGGCACTCGAGCAGTCTATTTGGCCAGTGTGGCAGCTTTCCCTGTGGCTGCCGGTGCCACATGCCTGT 
CCCACAGTGTGGCCGTGGTGACAGCTTCAGCCGCCCTCACCGGGTTCACCTTCTCAGCCCTGCAGATCCTGCC 
CTACACACTGGCCTCCCTCTACCACCGGGAGAAGCAGGTGTTCCTGCCCAAATACCGAGGGGACACTGGAGGT 
GCTAGCAGTGAGGACAGCCTGATGACCAGCTTCCTGCCAGGCCCTAAGCCTGGAGCTCCCTTCCCTAATGGAC 
ACGTGGGTGCTGGAGGCAGTGGCCTGCTCCCACCTCCACCCGCGCTCTGCGGGGCCTCTGCCTGTGAtGTCTC 
CGTACGTGTGGTGGTGGGTGAGCCCACCGAGGCCAGGGTGGTTCCGGGCCGGGGCATCTGCCTGGACCTCGCC 
ATCCTGGATAGTGCCTTCCTGCTGTCCCAGGTGGCCCCATCCCTGTTTATGGGCTCCATTGTCCAGCTCAGCC 
AGTCTGTCACTGCCTATATGGTGTCTGCCGCAGGCCTGGGTCTGGTCGCCATTTACTTTGCTACACAGGTAGT 
ATTTGAC AAGAG CGACTTGGCC AAATACTC AGCGgg tggacaccatcaccatcaccattaa 



Construct 2 - Coding sequence of P50W * g HIS (control) (veast strain SC333 ) 
Protein sequence (SEQ ID NO:29) 



MVLGIGPVLG 


LVCVPLLGSA 


SDHWRGRYGR 


RRPFIWALSL 


GILLSLFLIP 


RAGWLAGLLC 


60 


PDPRPLELAL 


LILGVGLLDF 


CGQVCFTPLE 


ALLSDLFRDP 


DHCRQAYSVY 


AFMISLGGCL 


120 


GYLLPAIDWD 


TSALAPYLGT 


QEECLFGLLT 


L I FLTCVAAT 


LLVAEEAALG 


PTEPAEGLSA 


180 


PSLSPHCCPC 


RARLAFRNLG 


ALLPRLHQLC 


CRMPRTLRRL 


FVAELCSWMA 


LMTFTLFYTD 


240 


FVGEGLYQGV 


PRAEPGTEAR 


RHYDEGVRMG 


SLGLFLQCAI 


SLVFSLVMDR 


LVQRFGTRAV 


300 


YLASVAAFPV 


AAGATCLSHS 


VAWTASAAL 


TGFTFSAIiQI 


LPYTLASLYH 


REKQVFLPKY 


360 


RGDTGG AS S E 


DSLMTSFLPG 


PKPGAPFPNG 


HVGAGGSGLL 


PPPPALCGAS 


ACDVSVRVW 


420 


GEPTEARWP 


GRGICLDLAI 


LDSAFLLSQV 


APSLFMGSIV 


QLSQSVTAYM 


VSAAGLGLVA 


480 


IYFATQWFD 


KSDLAKYSAG 


GHHHHHH 507 








Nucleotide sequence (SEQ ID NO:30) 










atgGTGCTGG GCATTGGTCC 


AGTGCTGGGC 


CTGGTCTGTG 


TCCCGCTCCT 


AGGCTCAGCC 


60 


AGTGACCACT 


GGCGTGGACG 


CTATGGCCGC 


CGCCGGCCCT 


TCATCTGGGC 


ACTGTCCTTG 


120 


GGCATCCTGC 


TGAGCCTCTT 


TCTCATCCCA 


AGGG CCGGCT 


GGCTAGCAGG 


GCTGCTGTGC 


180 


CCGGATCCCA 


GGCCCCTGGA 


GCTGGCACTG 


CTCATCCTGG 


GCGTGGGGCT 


GCTGGACTTC 


240 


TGTGGCCAGG 


TGTGCTTCAC 


TCCACTGGAG 


GCCCTGCTCT 


CTGACCTCTT 


CCGGGACCCG 


300 


GACCACTGTC 


GCCAGGCCTA 


CTCTGTCTAT 


G C CTTCATG A 


TCAGTCTTGG 


GGGCTGCCTG 


360 


GGCTACCTCC 


TGCCTGCCAT 


TGACTGGGAC 


ACCAGTGCCC 


TGGCCCCCTA 


CCTGGGCACC 


420 


CAGGAGGAGT 


GCCTCTTTGG 


CCTGCTCACC 


CTCATCTTCC 


TCACCTGCGT 


AG C AGCC AC A 


480 


CTGCTGGTGG 


CTGAGGAGGC 


AGCGCTGGGC 


CCCACCGAGC 


CAGCAGAAGG 


GCTGTCGGCC 


540 



CCCTCCTTGT CGCCCCACTG CTGTCCATGC CGGGCCCGCT TGGCTTTCCG GAACCTGGGC 600 
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GCCCTGCTTC 


CCCGGCTGCA 


CCAGC TGTGC 


TGCCGCATGC 


CCCGCACCCT 


GCGCCGGCTC 


660 


TTCGTGGCTG 


AG CTGTG CAG 


CTGGATGGCA 


CTCATGACCT 


TCACGCTGTT 


TTACACGGAT 


720 


TTCGTGGGCG 


AGGGGCTGTA 


CCAGGGCGTG 


CCCAGAGCTG 


AGCCGGGCAC 


CGAGGCCCGG 


780 


AGACACTATG 


ATGAAGGCGT 


TCGGATGGGC 


AGCCTGGGGC 


TGTTCCTGCA 


GTGCGCCATC 


840 


TCCCTGGTCT 


TCTCTCTGGT 


CATGG AC CGG 


CTGGTGCAGC 


GATTCGGCAC 


TCGAGCAGTC 


900 


TATTTGGCCA 


GTGTGGCAGC 


TTTCCCTGTG 


GCTGCCGGTG 


CCACATGCCT 


GTCCCACAGT 


960 


GTGGCCGTGG 


TGACAGCTTC 


AGCCGCCCTC 


ACCGGGTTCA 


CCTTCTCAGC 


CCTGCAGATC 


1020 


CTGCCCTACA 


CACTGGCCTC 


CCTCTACCAC 


CGGGAGAAGC 


AGGTGTTCCT 


GCCCAAATAC 


1080 


CGAGGGGACA 


CTGGAGGTGC 


TAG CAGTG AG 


GACAGCCTGA 


TGACCAGCTT 


CCTGCCAGGC 


1140 


CCTAAGCCTG 


GAGCTCCCTT 


CCCTAATGGA 


CACGTGGGTG 


CTGGAGGCAG 


TGGCCTGCTC 


1200 


CCACCTCCAC 


CCGCGCTCTG 


CGGGGCCTCT 


GCCTGTGAtG 


TCTCCGTACG 


TGTGGTGGTG 


1260 


GGTGAGCCCA 


CCGAGGCCAG 


GGTGGTTCCG 


GGCCGGGGCA 


TCTGCCTGGA 


CCTCGCCATC 


1320 


CTGGATAGTG 


CCTTCCTGCT 


GTCCCAGGTG 


GCCCCATCCC 


TGTTTATGGG 


CTCCATTGTC 


1380 


CAGCTCAGCC 


AGTCTGTCAC 


TGCCTATATG 


GTGTCTGCCG 


CAGGCCTGGG 


TCTGGTCGCC 


1440 


ATTTACTTTG 


CTACACAGGT 


AGTATTTGAC 


AAGAGCGACT 


TGGCCAAATA 


CTCAGCGggt 


1500 


ggacaccatc 


accatcacca 


ttaa 1524 











Construct 3 - Coding sequence of natssP501i.^ P501si.ssa HIS (yeast strain Y1800) 
Protein sequence (SEQ ID NO:31) 

Rl R2 

MAAVORLWSRLLRHRKAQLLLVNLLTFGLEV^ 

R3 R4 R5 

(YYFPSSGYMLADRWRKHTDGNWWFDNSGEMATGWKKIA^ 

R6 

lYKDTWYYLDAKEGAt M OYIKANSKFIGITE GV lMVSNAFIQSADG 

MVLGIGPVLGLVCVPLLGSASDHWRGRYGRRRPFIWALSLGILLSLFLIPRAGWLAGLLCPDPRPLEL 

ALLILGVGLLDFCGQVCFTPLEALLSDLFRDPDHCRQAYSVYAFMISLGGCLGYLLPAIDWDTSALAP 

YLGTQEECLFGLLTLIFLTCVAATLLVAEEAALGPTEPAEGLSAPSLSPHCCPCRARLAFRNLGALLPR 

LHQLCCRMPRTLRRLFVAELCSWMALMTFTLFYTDFVGEGLYQGVPRAEPGTEARRHYDEGVRMG 

SLGLFLQCAISLWSLVMDRLVQRFGTRAWLASVAAFPVAAGATCLSHSVAVVTASAALTGFTFSA 

LQILPYTIJVSLYHREKQVFLPKYRGDTGGASSEDSLMTSFLPGPKPGAPFPNGHVGAGGSGLLPPPPA 

LCGASACDVSWVWGEPTEARVWGRGICLDLAILDSAFLLSQVAPSLFMGSIVQLSQSVTAYMVS 

AAGLGLVAIYFATQWFDKSDLAKYSAGGHHHHHH 

R1 (plain): aa38-42 (fragment) R4 (bold): aa77-106 P2 (underline): 130-143 

R2 (bold): aa43-64 R5 (plain): aa107-126 

R3 (plain): aa65-76 R6a (bold): aa127-128 R6b (bold): aa146-166 
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natss stands for native signal sequence 
Nucleotide sequence (SEQ ID NO:32) 

ATGg CGGC CGTG C AG AGGCT ATGGGTATCG AG ACTG CT AAG ACACCGCAAAGCT CAGTTGTTGTTGGTTAACT 
TGTTGACCTTCGGGCTGGAAGTCTGTTTGGCggccgctTACGTACATTCCGACGGCTCTTATCCAAAAGACAA 
GTTTGAGAAAATCAATGGCACTTGGTACTACTTTGACAGTTCAGGCTATATGCTTGCAGACCGCTGGAGGAAG 
CACACAGACGGCAACTGGTACTGGTTCGACAACTCAGGCGAAATGGCTACAGGCTGGAAGAAAATCGCTGATA 
AGTGGTACTATTTCAACGAAGAAGGTGCCATGAAGACAGGCTGGGTCAAGTACAAGGACACTTGGTACTACTT 
AGACGCTAAAGAAGGCGCCatQ caatacatcaaqqctaactctaaqttcattgqtatcactqaa qqcqtcATG 
GTATCAAATGCCTTTATCCAGTCAGCGGACGGAACAGGCTGGTACTACCTCAAACCAGACGGAACACTGGCAG 
ACAGGCCAGAAaagttcatgtaCatgGTGCTGGGCATTGGTCCAGTGCTGGGCCTGGTCTGTGTCCCGCTCCT 
AGGCTCAGCCAGTGACCACTGGCGTGGACGCTATGGCCGCCGCCGGCCCTTCATCTGGGCACTGTCCTTGGGC 
ATCCTGCTGAGCCTCTTTCTCATCCCAAGGGCCGGCTGGCTAGCAGGGCTGCTGTGCCCGGATCCCAGGCCCC 
TGGAGCTGGCACTGCTCATCCTGGGCGTGGGGCTGCTGGACTTCTGTGGCCAGGTGTGCTTCACTCCACTGGA 
GGCCCTGCTCTCTGACCTCTTCCGGGACCCGGACCACTGTCGCCAGGCCTACTCTGTCTATGCCTTCATGATC 
AGTCTTGGGGGCTGCCTGGGCTACCTCCTGCCTGCCATTGACTGGGACACCAGTGCCCTGGCCCCCTACCTGG 
GCACCCAGGAGGAGTGCCTCTTTGGCCTGCTCACCCTCATCTTCCTCACCTGCGTAGCAGCCACACTGCTGGT 
GGCTGAGGAGGCAGCGCTGGGCCCCACCGAGCCAGCAGAAGGGCTGTCGGCCCCCTCCTTGTCGCCCCACTGC 
TGTCCATGCCGGGCCCGCTTGGCTTTCCGGAACCTGGGCGCCCTGCTTCCCCGGCTGCACCAGCTGTGCTGCC 
GCATGCCCCGCACCCTGCGCCGGCTCTTCGTGGCTGAGCTGTGCAGCTGGATGGCACTCATGACCTTCACGCT 
GTTTTACACGGATTTCGTGGGCGAGGGGCTGTACCAGGGCGTGCCCAGAGCTGAGCCGGGCACCGAGGCCCGG 
AGACACTATGATGAAGGCGTTCGGATGGGCAGCCTGGGGCTGTTCCTGCAGTGCGCCATCTCCCTGGTCTTCT 
CTCTGGTCATGGACCGGCTGGTGCAGCGATTCGGCACTCGAGCAGTCTATTTGGCCAGTGTGGCAGCTTTCCC 
TGTGGCTGCCGGTGCCACATGCCTGTCCCACAGTGTGGCCGTGGTGACAGCTTCAGCCGCCCTCACCGGGTTC 
ACCTTCTCAGCCCTGCAGATCCTGCCCTACACACTGGCCTCCCTCTACCACCGGGAGAAGCAGGTGTTCCTGC 
CCAAATAC CGAGGGG AC ACTGGAGGTGCTAGCAGTGAGG AC AG CCTGATGACCAGCTTCCTG CC AGGCCCT AA 
GCCTGGAGCTCCCTTCCCTAATGGACACGTGGGTGCTGGAGGCAGTGGCCTGCTCCCACCTCCACCCGCGCTC 
TGCGGGGCCTCTGCCTGTGAtGTCTCCGTACGTGTGGTGGTGGGTGAGCCCACCGAGGCCAGGGTGGTTCCGG 
GCCGGGGCATCTGCCTGGACCTCGCCATCCTGGATAGTGCCTTCCTGCTGTCCCAGGTGGCCCCATCCCTGTT 
TATGGGCTCCATTGTCCAGCTCAGCCAGTCTGTCACTGCCTATATGGTGTCTGCCGCAGGCCTGGGTCTGGTC 
GCCATTTACTTTGCTACACAGGTAGTATTTGACAAGAGCGACTTGGCCAAATACTCAGCGggtggacaccatc 
accatcaccattaa 

Construct 4 - Coding sequence of alphapreCPC-PSOIsw^ HIS (yeast strain Y1802) 
Protein sequence (SEQ ID NO:33) 

Alpha -pre signal Rl R2 R3 

MAARFPSIFTAVLFAASSAliAAA jWHSDGSYPiU)K^ 




[NSGgMATGWKKIAPKWYYFireEGAMKTGWKYKPT^^ 
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LIPRAGWLAGLLCPDPRPLELALLILGVGLLDFCGQVCFTPLEALLSDLFRDPDHCRQAYSVYAFMISLGGCL 
GYLLPAIDWDTSALAPYLGTQEECLFGLLTLIFLTCVAATLLVAEEAALGPTEPAEGLSAPSLSPHCCPCRAR 
LAFRNLGALLPRLHQLCCRMPRTLRRLFVAELCSWMALMTFTLFYTDFVGEGLYQGVPRAEPGTEARRHYDEG 
VRMGSLGLFLQCAISLVFSLVMDRLVQRFGTRAVYLASVAAFPVAAGATCLSHSVAVVTASAALTGFTFSALQ 
ILPYTLASLYHREKQVFLPKYRGDTGGASSEDSLMTSFLPGPKPGAPFPNGHVGAGGSGLLPPPPALCGASAC 
DVSVRVWGEPTEARWPGRGICLDLAILDSAFLLSQVAPSLFMGSIVQLSQSVTAYMVSAAGLGLVAIYFAT 
QWFDKSDLAKYSAGGHHHHHH 

Alpha-pre signal (bold): aa4-22 

R1 (plain): aa24-28 (fragment) R4 (bold): aa72-91 P2 (underline): 1 16-129 

R2 (bold): aa29-49 R5 (plain): aa92-1 12 



Alphapre stands for alpha pre signal sequence 
Nucleotide sequence (SEQ ID NO:34) 

TACGTACATTCCGACGGCTCTTATCCAAAAGACAAGTTTGAGAAAATCAATGGCACTTGGTACTACTTTGACA 
GTTCAGGCTATATGCTTGCAGACCGCTGGAGGAAGCACACAGACGGCAACTGGTACTGGTTCGACAACTCAGG 
CGAAATGGCTACAGGCTGGAAGAAAATCGCTGATAAGTGGTACTATTTCAACGAAGAAGGTGCCATGAAGACA 

a c t c t a aq 1 1 c a t fc qq t a t C a C t q a a qq C q t c ATGGTATCAAATG CCTTT ATCC AGTCAGCGGACGG AACAGG 
CTGGTACTACCTCAAACCAGACGGAACACTGGCAGACAGGCCAGAA 

ATGgcGGCCAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAGCgg C cgc t TACG 
TACATTCCGACGGCTCTTATCCAAAAGACAAGTTTGAGAAAATCAATGGCACTTGGTACTACTTTGACAGTTC 
AGGCTATATGCTTGCAGACCGCTGGAGGAAGCACACAGACGGCAACTGGTACTGGTTCGACAACTCAGGCGAA 
ATGGCTACAGGCTGGAAGAAAATCGCTGATAAGTGGTACTATTTCAACGAAGAAGGTGCCATGAAGACAGGCT 
nnnwni ari^ar , aafafl&f t Af?TTflQTAfyrAt^TAGAPQCTAAAGAAGGCGCCatq caatacatcaaq qctaactC 
taagttcattqqtatcactqaa qqcqtcATGGTATCAAATGCCTTTATCCAGTCAGCGGACGGAACAGGCTGG 
T ACT ACCT C AAACC AGACGG AACACTGGC AG ACAGGCC AG AAg c t gg tattacttacgttccaccattgttgt 
tggaagttggtgttgaagaaaagttcatgtaCatgGTGCTGGGCATTGGTCCAGTGCTGGGCCTGGTCTGTGT 
CCCGCTCCTAGGCTCAGCCAGTGACCACTGGCGTGGACGCTATGGCCGCCGCCGGCCCTTCATCTGGGCACTG 
TCCTTGGGCATCCTGCTGAGCCTCTTTCTCATCCCAAGGGCCGGCTGGCTAGCAGGGCTGCTGTGCCCGGATC 
CCAGGCCCCTGGAGCTGGCACTGCTCATCCTGGGCGTGGGGCTGCTGGACTTCTGTGGCCAGGTGTGCTTCAC 
TCCACTGGAGGCCCTGCTCTCTGACCTCTTCCGGGACCCGGACCACTGTCGCCAGGCCTACTCTGTCTATGCT 
TCATGATCAGTCTTGGGGGCTGCCTGGGCTACCTCCTGCCTGCCATTGACTGGGACACCAGTGCCCTGGCCCC 
CTACCTGGGCACCCAGGAGGAGTGCCTCTTTGGCCTGCTCACCCTCATCTTCCTCACCTGCGTAGCAGCCACA 



R3 (plain): aa50-71 



R6a (bold): aa113-114 



R6b (bold):aa132-152 
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CTGCTGGTGGCTGAGGAGGCAGCGCTGGGCCCCACCGAGCCAGCAGAAGGGCTGTCGGCCCCCTCCTTGTCGC 
CCCACTGCTGTCCATGCCGGGCCCGCTTGGCTTTCCGGAACCTGGGCGCCCTGCTTCCCCGGCTGCACCAGCT 
GTGCTGCCGCATGCCCCGCACCCTGCGCCGGCTCTTCGTGGCTGAGCTGTGCAGCTGGATGGCACTCATGACC 
TTCACGCTGTTTTACACGGATTTCGTGGGCGAGGGGCTGTACCAGGGCGTGCCCAGAGCTGAGCCGGGCACCG 
AGG CCCGG AG ACACTATGATG AAGGCGTTCGGATGGGCAGC CTGGGG CTGTTCCTGC AGTG CG C CATCT CC CT 
GGTCTTCTCTCTGGTCATGGACCGGCTGGTGCAGCGATTCGGCACTCGAGCAGTCTATTTGGCCAGTGTGGCA 
GCTTTCCCTGTGGCTGCCGGTGCCACATGCCTGTCCCACAGTGTGGCCGTGGTGACAGCTTCAGCCGCCCTCA 
CCGGGTTCACCTTCTCAGCCCTGCAGATCCTGCCCTACACACTGGCCTCCCTCTACCACCGGGAGAAGCAGGT 
GTTC CTGC CCAAATAC CGAGGGG AC ACTGGAGGTGCTAG CAGTG AGGAC AGCCTGATG ACC AGCTTC CTG C CA 
GGCCCTAAGCCTGGAGCTCCCTTCCCTAATGGACACGTGGGTGCTGGAGGCAGTGGCCTGCTCCCACCTCCAC 
CCGCGCTCTGCGGGGCCTCTGCCTGTGAtGTCTCCGTACGTGTGGTGGTGGGTGAGCCCACCGAGGCCAGGGT 
GGTTCCGGGCCGGGGCATCTGCCTGGACCTCGCCATCCTGGATAGTGCCTTCCTGCTGTCCCAGGTGGCCCCA 
TCCCTGTTTATGGGCTCCATTGTCCAGCTCAGCCAGTCTGTCACTGCCTATATGGTGTCTGCCGCAGGCCTGG 
GTCTGGTCGC CATTTACTTTGCTACAC AGGT AGT ATTTG ACAAG AGCGACTTGG CC AAATACTCAG CGgg t g g 
acaccatcaccatcaccattaa 

Construct 5 - Coding sequence of alphaprepro-P501si^ HIS (in plasmid pRIT 15068 and 

veast strain Y1790) 

Protein sequence (SEQ ID N0.35) 



MSFLNFTAVL 


FAASSALAAP 


VNTTTEDETA 


QIPAEAVIGY 


SDLEGDFDVA 


VLPFSNSTNN 


60 


GLLFINTTIA 


SIAAKEEGVS 


LEKREAEAMV 


LGIGPVLGLV 


CVPLLGSASD 


HWRGRYGRRR 


120 


PFIWALSLGI 


LLSLFLIPRA 


GWLAGLLCPD 


PRPLELALLI 


LGVGLLDFCG 


QVCFTPLEAL 


180 


LSDLFRDPDH 


CRQAYSVYAF 


MISLGGCLGY 


LLPAIDWDTS 


ALAPYLGTQE 


ECLFGLLTLI 


240 


FLTCVAATLL 


VAEEAALGPT 


EPAEGLSAPS 


LSPHCCPCRA 


RLAFRN LG AL 


LPRLHQLCCR 


300 


MPRTLRRLFV 


AELCSWMALM 


TFTLFYTDFV 


GEGLYQGVPR 


AEPGTEARRH 


YDEGVRMGSL 


360 


GLFLQCAISL 


VFSLVMDRLV 


QRFGTRAVYL 


ASVAAFPVAA 


GATCLSHSVA 


WTASAALTG 


420 


FTFSALQILP 


YTLASLYHRE 


KQVFLPKYRG 


DTGGASSEDS 


LMTSFLPGPK 


PGAPFPNGHV 


480 


GAGGSGLLPP 


PPALCGASAC 


DVSVRVWGE 


PTEARWPGR 


GICLDLAILD 


SAFLLSQVAP 


540 


SLFMGSIVQL 


SQSVTAYMVS 


AAGLGLVAIY 


FATQWFDKS 


DLAKYSAGGH 


HHHHH 595 




Nucleotide sequence (SEQ ID NO:36) 










ATGAGTTTCC 


TCAATTTTAC 


TGCAGTTTTA 


TTCG CAG CAT 


CCTCCGCATT 


AGCTGCTCCA 


60 


GTCAACACTA 


CAACAGAAGA 


TGAAACGGCA 


CAAATTCCGG 


CTGAAGCTGT 


CATCGGTTAC 


120 


TCAGATTTAG 


AAGGGGATTT 


CGATGTTGCT 


GTTTTGCCAT 


TTTCCAACAG 


CACAAATAAC 


180 


GGGTTATTGT 


TTATAAATAC 


TACTATTGCC 


AGCATTGCTG 


CTAAAGAAGA 


AGGGGTATCT 


240 


CTCGAGAAAA GAGAGGCTGA AGCCatgGTG 


CTGGGCATTG 


GTCCAGTGCT 


GGGCCTGGTC 


300 


TGTGTCCCGC 


TCCTAGGCTC 


AGCCAGTGAC 


CACTGGCGTG 


GACGCTATGG 


CCGCCGCCGG 


360 



PC 



03P 
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CCCTTCATCT 


GGGCACTGTC 


CTTGGGCATC 


CTGCTGAGCC 


TCTTTCTCAT 


CCCAAGGGCC 


420 


GGCTGGCTAG 


CAGGGCTGCT 


GTGCCCGGAT 


CCCAGGCCCC 


TGGAGCTGGC 


ACTGCTCATC 


480 


CTGGGCGTGG 


GGCTGCTGGA 


CTTCTGTGGC 


CAGGTGTGCT 


TCACTCCACT 


GGAGGCCCTG 


540 


CTCTCTGACC 


TCTTCCGGGA 


CCCGGACCAC 


TGTCGCCAGG 


CCTACTCTGT 


CTATGCCTTC 


600 


ATGATCAGTC 


TTGGGGGCTG 


CCTGGGCTAC 


CTCCTGCCTG 


CCATTGACTG 


GGACACCAGT 


660 


GCCCTGGCCC 


CCTACCTGGG 


CACCCAGGAG 


GAGTGCCTCT 


TTGGCCTGCT 


CACCCTCATC 


720 


TTCCTCACCT 


GCGTAGCAGC 


CACACTGCTG 


GTGG CTGAGG 


AGGCAGCGCT 


GGGCCCCACC 


780 


GAGCCAGCAG 


AAGGGCTGTC 


GGCCCCCTCC 


TTGTCGCCCC 


ACTGCTGTCC 


ATGCCGGGCC 


840 


CGCTTGGCTT 


TCCGGAACCT 


GGGCGCCCTG 


CTTCCCCGGC 


TGCACCAGCT 


GTGCTGCCGC 


900 


ATGCCCCGCA 


CCCTGCGCCG 


GCTCTTCGTG 


GCTGAGCTGT 


GCAGCTGGAT 


GGCACTCATG 


960 


ACCTTCACGC 


TGTTTTACAC 


GGATTTCGTG 


GGCGAGGGGC 


TGTACCAGGG 


CGTGCCCAGA 


1020 


GCTGAGCCGG 


GCACCGAGGC 


CCGGAGACAC 


TATGATGAAG 


GCGTTCGGAT 


GGGCAGCCTG 


1080 


GGGCTGTTCC 


TGCAGTG CGC 


CATCTCCCTG 


GTCTTCTCTC 


TGGTCATGGA 


CCGGCTGGTG 


1140 


CAGCGATTCG 


GCACTCGAGC 


AGTCTATTTG 


GCCAGTGTGG 


CAGCTTTCCC 


TGTGGCTGCC 


1200 


GGTG C C AC AT 


GCCTGTCCCA 


CAGTGTGGCC 


GTGGTGACAG 


CTTCAGCCGC 


CCTCACCGGG 


1260 


TTCACCTTCT 


CAGCCCTGCA 


GATCCTGCCC 


TACACACTGG 


CCTCCCTCTA 


CCACCGGGAG 


1320 


AAGCAGGTGT TCCTGCCCAA ATACCGAGGG 


GACACTGGAG 


GTGCTAGCAG 


TGAGGACAGC 


1380 


CTGATGACCA 


GCTTCCTGCC 


AGGCCCTAAG 


CCTGGAGCTC 


CCTTCCCTAA 


TGGACACGTG 


1440 


GGTGCTGGAG 


GCAGTGGCCT 


GCTCCCACCT 


CCACCCGCGC 


TCTGCGGGGC 


CTCTGCCTGT 


1500 


GAtGTCTCCG 


TACGTGTGGT 


GGTGGGTGAG 


CCCACCGAGG 


CCAGGGTGGT 


TCCGGGCCGG 


1560 


GGCATCTGCC 


TGGACCTCGC 


CATC CTGG AT 


AGTGCCTTCC 


TGCTGTCCCA 


GGTGGCCCCA 


1620 


TCCCTGTTTA TGGGCTCCAT 


TGTCCAGCTC 


AGCCAGTCTG 


TCACTGCCTA 


TATGGTGTCT 


1680 


GCCGCAGGCC 


TGGGTCTGGT 


CGCCATTTAC 


TTTGCTACAC 


AGGTAGTATT 


TGACAAGAGC 


1740 


GACTTGGCCA AATACTCAGC Gggtggacac 


catcaccatc 


accattaa 


1788 
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FIG. 3. Structure of CPC-p501 His fusion protein expressed in S. cerevisiae 

Clyta repeats 

P2 peptide 
P501 sequences 
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FIG. 4. Primary structure of CPC-P501 His fusion protein (SEQ ID NO.41) 



MAAAYVHSDG 


SYPKDKFEKI 


NGTWYYFDSS 


GYMLADRWRK 


HTDGNWYWFD 


NSGEMATGWK 


60 


KIADKWYYFN 


EEGAMKTGWV 


KYKDTWYYLD 


AKEGAMQYIK 


ANSKFIGITE 


GVMVSNAFIQ 


120 


SADGTGWYYL 


KPDGTLADRP 


EKFMYMVLGI 


GPVLGLVCVP 


LLGSASDHWR 


GRYGRRRPFI 


180 


WALSLGILLS 


LFLIPRAGWL 


AGLLCPDPRP 


LELALLIXjGV GLLDFCGQVC 


FTPLEALLSD 


240 


LFRDPDHCRQ 


AYSVYAFMIS 


LGGCLGYLLP 


AIDWDTSALA 


PYLGTQEECL 


FGLLTLIFLT 


300 


CVAATLLVAE 


EAALGPTEPA 


EGLSAPSLSP 


HCCPCRARIiA 


FRNLGALLPR 


IiHQLCCRMPR 


360 


TLRRLFVAEL 


CSWMALMTFT 


LFYTDFVGEG 


LYQGVPRAEP 


GTEARRHYDE 


GVRMGSLGLF 


420 


LQCAISLVFS 


LVMDRLVQRF 


GTRAVYLASV 


AAFPVAAGAT 


CLSHSVAWT 


ASAALTGFTF 


480 


SALQILPYTL ASLYHREKQV 


FLPKYRGDTG 


GASSEDSLMT 


SFLPGPKPGA 


PFPNGHVGAG 


540 


GSGLLPPPPA 


LCGASACDVS 


VRVWGEPTE 


ARWPGRGIC 


LDLAILDSAF 


LLSQVAPSLF 


600 


MGSIVQLSQS 


VTAYMVSAAG 


LGLVAI YFAT 


QWFDKSDLA KYSAGGHHHH 


HH 652 
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FIG. 5. Nucleotide sequence of CPC P501 Hls(pRIT15201) (SEQ ID N0.42) 



ATGGCGGCCG 


CTTACGTACA 


TTCCGACGGC 


TCTTATCCAA 


AAGACAAGTT 


TGAGAAAATC 


60 


AATGGCACTT 


GGTACTACTT 


TGACAGTTCA 


GGCTATATGC 


TTGCAGACCG 


CTGGAGGAAG 


120 


CACACAGACG 


GCAACTGGTA 


CTGGTTCGAC 


AACTCAGGCG 


AAATGG CTAC 


AGGCTGGAAG 


180 


AAAATCGCTG 


ATAAGTGGTA 


CTATTTCAAC 


GAAGAAGGTG 


CCATGAAGAC 


AGGCTGGGTC 


240 


AAGTACAAGG 


ACACTTGGTA 


CTACTTAGAC 


GCTAAAGAAG 


GCGCCATGCA 


ATACATCAAG 


300 


GCTAACTCTA 


AGTTCATTGG 


TATCACTGAA 


GGCGTCATGG 


TATCAAATGC 


CTTTATCCAG 


360 


TCAGCGGACG 


GAACAGGCTG 


GTACTACCTC 


AAACCAGACG 


GAACACTGGC 


AG AC AGG CCA 


420 


GAAAAGTTCA 


TGTACATGGT 


GCTGGGCATT 


GGTCCAGTGC 


TGGGCCTGGT 


CTGTGTCCCG 


480 


CTCCTAGGCT 


CAGCCAGTGA 


CCACTGGCGT 


GGACGCTATG 


GCCGCCGCCG 


GCCCTTCATC 


540 


TGGG CACTGT 


CCTTGGGCAT 


CCTGCTGAGC 


CTCTTTCTCA 


TCCCAAGGGC 


CGGCTGGCTA 


600 


GCAGGGCTGC 


TGTGCCCGGA 


TCCCAGGCCC 


CTGGAGCTGG 


CACTG CTCAT 


CCTGGGCGTG 


660 


GGG CTGCTGG 


ACTTCTGTGG 


CCAGGTGTGC 


TTCACTCCAC 


TGGAGGCCCT 


GCTCTCTGAC 


720 


CTCTTCCGGG 


ACCCGGACCA 


CTGTCGCCAG 


GCCTACTCTG 


TCTATGCCTT 


CATGATCAGT 


780 


CTTGGGGGCT 


GCCTGGGCTA 


CCTCCTGCCT 


GCCATTGACT 


GGGACACCAG 


TGCCCTGGCC 


840 


CCCTACCTGG 


GCACCCAGGA 


GGAGTGCCTC 


TTTGGCCTGC 


TCACCCTCAT 


CTTCCTCACC 


900 


TGCGT AG C AG 


CCACACTGCT 


GGTGGCTGAG 


GAGGCAGCGC 


TGGGCCCCAC 


CGAGCCAGCA 


960 


GAAGGGCTGT 


CGGCCCCCTC 


CTTGTCGCCC 


CACTGCTGTC 


CATGCCGGGC 


CCGCTTGGCT 


1020 


TTCCGGAACC 


TGGGCGCCCT 


GCTTCCCCGG 


CTGCACCAGC 


TGTGCTGCCG 


CATGCCCCGC 


1080 


ACCCTGCGCC 


GGCTCTTCGT 


GGCTGAGCTG 


TGCAGCTGGA 


TGGCACTCAT 


GACCTTCACG 


1140 


C TGTTTT AC A 


CGGATTTCGT 


GGGCGAGGGG 


CTGTACCAGG 


GCGTGCCCAG 


AGCTGAGCCG 


1200 


GGCACCGAGG 


CCCGGAGACA 


CTATGATGAA 


GGCGTTCGGA 


TGGGCAGCCT 


GGGGCTGTTC 


1260 


CTGCAGTGCG 


CCATCTCCCT 


GGTCTTCTCT 


CTGGTCATGG 


ACCGGCTGGT 


GCAGCGATTC 


1320 


GGCACTCGAG 


CAGTCTATTT 


GGCCAGTGTG 


GCAGCTTTCC 


CTGTGGCTGC 


CGGTGCCACA 


1380 


TGCCTGTCCC 


ACAGTGTGGC 


CGTGGTGACA 


GCTTCAGCCG 


CCCTCACCGG 


GTTCACCTTC 


1440 


TCAGCCCTGC 


AGATCCTGCC 


CTACACACTG 


GCCTCCCTCT 


ACCACCGGGA 


GAAGCAGGTG 


1500 


TTCCTGCCCA 


AATACCGAGG 


GGACACTGGA 


GGTGCTAG C A 


GTGAGGACAG 


CCTGATGACC 


1560 


AGCTTCCTGC 


CAGGCCCTAA 


GCCTGGAGCT 


CCCTTCCCTA ATGGACACGT 


GGGTGCTGGA 


1620 


GGCAGTGGCC 


TGCTCCCACC 


TCCACCCGCG 


CTCTGCGGGG 


CCTCTGCCTG 


TGATGTCTCC 


1680 


GTACGTGTGG 


TGGTGGGTGA 


GCCCACCGAG 


GCCAGGGTGG 


TTCCGGGCCG 


GGGCATCTGC 


1740 


CTGGACCTCG 


CCATCCTGGA 


TAGTGCCTTC 


CTGCTGTCCC 


AGGTGGCCCC 


ATCCCTGTTT 


1800 


ATGGGCTCCA 


TTGTCCAGCT 


CAGCCAGTCT 


GTCACTGCCT 


AT ATG GTGTC 


TGCCGCAGGC 


1860 


CTGGGTCTGG 


TCGCCATTTA 


CTTTGCTACA 


CAGGTAGTAT 


TTGACAAGAG 


CGACTTGGCC 


1920 


AAATACTCAG 


CGGGTGGACA 


CCATCACCAT 


CACCATTAA 


1959 
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FIG. 6. Cloning strategy for generation of plasmid pRIT 15201 



Ndel C-'y* 
NCOl 

Sphl 



Hybridized oligos 
P21/P22 



5' catgcaatacatcaaggctaactctaagttcattggtatcactgaaggcgt 3' 
3 r gttatgtagttccgattgagattcaagtaaccatagtgacttccgcagtac 5' 



NCOl 



pCUPl 





Ncol digestion 

C-lytA_P2_C-1ytA 
Ndel 

(NCOl) 
(NCOl) 
Sphl 



pBR327 



LEU2 




Ncol 
Digestion 



PCR amplification using 
tARG3 CLYTANOTATG 

=5' aaaaccatggcggccgcttacgta%attccgac^ 
3' 

CLYTA-aa55 =5 f aaacatgtacatgaacttttctggcctgtctgccagtgttc 3' 



atccaaaagacaag 



P BR32' 




Ncol + AFLIII 
Digestion 



NC ° 1 C-lytA_P2_C-lytA-P50 I aa5 1 - 
(NCOl) 



pCUPl 

NCOI "\_-— /-NCOI(ATG) 
"XM^-Sacl 
X — BamHI 



IARG3 



LEU2 




LEU2 



P501S 

aa51-aa553HlS 



pBR32 



tARG3 



LEU2 
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FIG. 7. Plasmld map of pRIT1 5201 
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FIG. 8. Comparative expression of CPC P501 and P501 in S.cerevisiae strain DCS (gel 
Laemmli 10%) 



1234567 1234567 




Silver staining Western blot anti P501 

(Monoclonal antibody) 



1 MW Biolabs (175/83/62/47.5/32.5/16.5 Kda) 

2 Y1796 purified 

3 Y1795 Crude Extract ( negative control) 

4 SC333 Crude Extract 

5 Y1796 Crude Extract 

6 Y1790 Crude Extract 

7 Y1802 Crude Extract 
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FIG. 9A. 




CP2C-P501S 



FIG. 9B. 




CP2C-P501S 



1 - Molecular Weight Marker ( Biolabs - Grow Range)l75; 83; 62; 47.5; 32.5; 25; 16.5; 6.5 kD - 10 

2 - Purified Reference CP2CP501S/I2 135 ng 

3 - Purified Reference CP2CP501S/12 67.8 ng 

4 - Purified Reference CP2CP501S/I2 33.9 ng 

5 - Purified Reference CP2CP501S/12 16.9 ng 

6 - Fermentation PROl I9-21H30 

7 - Fermentation PROl 24-21 h30 

8 - Fermentation PRO124-22H30 

9 - Fermentation PRO 127-0 h 

10 - Fermentation PRO 127-4 h 

11 - Fermentation PRO 127-6 h 

12 - Fermentation PRO127-22h20 

13 - Fermentation PRO!27-22h45 
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FIG. 10. Purification scheme of CPC-P501-His produced by Y1796. 

j S. Cerevisiae cells \ j 

\ ZZZZZX'-'-----'-''^^--^'''-^-''''''-'--'-''^'^-----''--''''''^-'^''^-''--''''ZZ} 
r""""'j^" t ^YiVdYsni|lUon I OD 120 / 2 passes / 20 mM Tris pH 8.5 - 5 mM EDTA 

Centrifugation j 12.000 g / RT / 90 min (supernatant discarded) 

i::::::::::* ::::::: 

Pellet washing step 1 • 20 mM Tris pH 8.5 - 0.15 M NaCI - 2.0 M Guanidine.HCI - 

\ 0.1% Empigen (30 min / RT) J 

; t - - ■> 

Centrifugation | 1 2.000 g / RT / 60 min (supernatant discarded) i 

j +ZZZZZZ. ZZ ZZCZZZZZZZZZZZZZZZZZ 

s " Pellet washing step 2 j 20 mM Tris pH 8.5 - 0.15 M NaCI - 4.0 M Urea 

zz^ 

Centrifugation j 1 2.000 g / RT / 30 min (supernatant discarded) 

c:;;:::::zi::z:::::::::i:::::i:zz:z^^^ 

Solubilisation / Reduction : 20 mM Tris pH 8.5 - 0.15 M NaCI - 8.0 M Urea - 1% SDS - • 
\ i 0.2 M Glutathion (60 min / RT) j 

• m _ _ . . „ _ ...... ................ — 

centrifugation j 1 2.000 g / RT / 30 min (pellet discarded) j 

I \ i 

j 4, j j 

i Carbamidomethytation : 0.3 M lodoacetamide (30 min / RT / in the dark) / pH 

j adjusted to 8.5 (with 5 M NaOH solution) before incubation 

I * ■ \XZZZZZZZZZZZZZZZZZZ'. zzzzzzzzzzzzzzzzzzzz 

R/C Supernatant 

1 ! ♦ 

I I I 

I * j ' i 

» ^ ................... . .......J...--.-.....-.......---..-------------. -_.._- > 

[ 10-fold dilution and j Dilution buffer : 20 mM Tris pH 8.5 - 1 M NaCI - 8.0 M Urea 

pH adjustment (8.5) 

I '""""TZZZZZZZZZZZZZZ 

i Immobilised metal ion affinity j Equilibration buffer : 20 mM Tris pH 8.5 - 0.9 M NaCI - 8.0 M j 
chromatography on j Urea - 0.1% SDS 

Ni~-Chelating Sepharose FF \ Washing buffers : 

(Amersham) j 1 ) Equilibration buffer 

: (10x25 cm column - 2000 ml) i 2) 20 mM Tris pH 8.5 - 0.15 M NaCI - 8.0 M Urea - 0.1% 

j j SDS j 

j I 3) 20 mM Tris pH 8.5 - 8.0 M Urea - 0.1% Tween 80 j 
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: Elution buffer: 20 mM Tris pH 8.5 - 8.0 M Urea - 0.1% Tween ; 




j 80 - 0.5 M Imidazole 


: * i i 


2-fold dilution and 


i 20 mM Piperazine pH 10.0 - 8.0 M Urea - 0.1% Tween 80 


pH adjustment (10.0) 




i * i : 


Anion exchange 


i Eauilibration buffer: 20 mM Piperazine dH 10.0 - 8.0 M Urea i 


chromatography on Q 


j -0.1% Tween 80 


Sepharose FF 


i Washina buffers: 


(Amersham) 


j 1) Equilibration buffer 


(2,6 x 6.5 cm column - 35 ml) 


j 2) 20 mM Tris pH 8.5 - 8.0 M Urea - 0.1% Tween 80 




I Elution buffer: 20 mM Tris pH 7.5 - 8.0 M Urea - 0.1% 




j Tween 80 - 0.5 M NaCI 


i * i j 


Concentration/Diafiltratlon 


: +/- 3-fold concentration 


j (Pall - Omega 10 kDa - 200 cm 2 ) 


I Diafiltration buffer: Tris 20 mM dH 7.5 


* j 


Sterile filtration 




! (Millipore - Millex GV 0.22pm) 




i >P j 


Purified bulk 


! Final buffer: 20 mM Tris pH 7.5 - +/- 0.3% Tween 80 


i * j 


• Storage -20°C 



1U/517 



WO 03/104272 PCT/EP03/06096 

19/45 

FIG. 11. Pattern of CPC P501 His purified protein (4-12% Novex Nu-Page polyacrylamide 
precasted gels) 




Coomassie Blue R250 Daiichi Silver Staining 




1: MW (250/150/75/50/37/25/15/10 kDa) 
2: Purified bulk A (reducing conditions) 
3: Purified bulk B (reducing conditions) 
4: Purified bulk C (reducing conditions) 
5: Purified bulk A (non reducing conditions) 
6: Purified bulk B (non reducing conditions) 
7: Purified bulk C (non reducing conditions) 



Western Blot anti P501S 
(Monoclonal antibody) 
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FIG. 12. Native full-length P501S sequence (SEQ ID NO:17 & 43) 

Nucleotide sequence: SEQ ID NO.17 
Polypeptide sequence: SEQ ID N0.43 
###### 

GCCACCATGGTCCAGAGGCTGTGGGTGAGCCGCCTGCTGCGGCACCGG 

MVQRLWVSRLLRHR 14 

AAAGCCCAGCTCTTGCTGGTCAACCTGCTAACCTTTGGCCTGGAGGTGTGTTTGGCCGCA 
KAQLLLVNL LTFGLEVCLAA 34 

GGCATCACCTATGTGCCGCCTCTGCTGCTGGAAGTGGGGGTAGAGGAGAAGTTCATGACC 
GITYVPPLLLEVGVEEKFMT 54 

ATGGTGCTGGGCATTGGTCCAGTGCTGGGCCTGGTCTGTGTCCCGCTCCTAGGCTCAGCC 
MVLGIGPVLGLVCVPLLGSA 74 

AGTGACCACTGGCGTGGACGCTATGGCCGCCGCCGGCCCTTCATCTGGGCACTGTCCTTG 
SDHWRGRYGRRRPFIWALSL 94 

GGCATCCTGCTGAGCCTCTTTCTCATCCCAAGGGCCGGCTGGCTAGCAGGGCTGCTGTGC 
GILLSLFLI PRAGWLAGLLC 114 

CCGGATCCCAGGCCCCTGGAGCTGGCACTGCTCATCCTGGGCGTGGGGCTGCTGGACTTC 
PDPRPLELALLILGVGLLDF 134 

TGTGGCCAGGTGTGCTTCACTCCACTGGAGGCCCTGCTCTCTGACCTCTTCCGGGACCCG 
CGQVCFTPL EALLSDL FRD P 154 

GACCACTGTCGCCAGGCCTACTCTGTCTATGCCTTCATGATCAGTCTTGGGGGCTGCCTG 
DHCRQAYSVYAFMI SLGGCL 174 

GGCTACCTCCTGCCTGCCATTGACTGGGACACCAGTGCCCTGGCCCCCTACCTGGGCACC 
GYLLPAIDWDTSALAPYLGT 194 

CAGGAGGAGTGCCTCTTTGGCCTGCTCACCCTCATCTTCCTCACCTGCGTAGCAGCCACA 
QEECLFGLLTLIFLTCVAAT 214 

CTGCTGGTGGCTGAGGAGGCAGCGCTGGGCCCCACCGAGCCAGCAGAAGGGCTGTCGGCC 
LLVAEEAALGPTEPAEGLSA 234 

CCCTCCTTGTCGCCCCACTGCTGTCCATGCCGGGCCCGCTTGGCTTTCCGGAACCTGGGC 
PSLSPHCCPCRARLAFRNLG 254 

GCCCTGCTTCCCCGGCTGCACCAGCTGTGCTGCCGCATGCCCCGCACCCTGCGCCGGCTC 
ALLPRLHQLCCRMPRTLRRL 274 

TTCGTGGCTGAGCTGTGCAGCTGGATGGCACTCATGACCTTCACGCTGTTTTACACGGAT 
FVAELCSWMALMTFTLFYTD 294 

TTCGTGGGCGAGGGGCTGTACCAGGGCGTGCCCAGAGCTGAGCCGGGCACCGAGGCCCGG 
FVGEGLYQGVPRAEPGTEAR 314 

AGACACTATGATGAAGGCGTTCGGATGGGCAGCCTGGGGCTGTTCCTGCAGTGCGCCATC 
RHYDEGVRMGSLGLFLQCAI 334 

TCCCTGGTCTTCTCTCTGGTCATGGACCGGCTGGTGCAGCGATTCGGCACTCGAGCAGTC 
SLVFSLVMDRLVQRFGTRAV 354 
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TATTTGGCCAGTGTGGCAGCTTTCCCTGTGGCTGCCGGTGCCACATGCCTGTCCCACAGT 
YLASVAAFPVAAGATCLSHS 374 

GTGGCCGTGGTGACAGCTTCAGCCGCCCTCACCGGGTTCACCTTCTCAGCCCTGCAGATC 
VAVVTASAALTGFTFSALQI 394 

CTGCCCTACACACTGGCCTCCCTCTACCACCGGGAGAAGCAGGTGTTCCTGCCCAAATAC 
LPYTLASLYHREKQVFLPKY 414 

CGAGGGGACACTGGAGGTGCTAGCAGTGAGGACAGCCTGATGACCAGCTTCCTGCCAGGC 
RGDTGGAS SEDSLMTSFLPG 434 

CCTAAGCCTGGAGCTCCCTTCCCTAATGGACACGTGGGTGCTGGAGGCAGTGGCCTGCTC 
PKPGAPFPNGHVGAGGSGLL 454 

CCACCTCCACCCGCGCTCTGCGGGGCCTCTGCCTGTGATGTCTCCGTACGTGTGGTGGTG 
PPPPALCGASACDVSVRVVV 474 

GGTGAGCCCACCGAGGCCAGGGTGGTTCCGGGCCGGGGCATCTGCCTGGACCTCGCCATC 
GEPTEARVVPGRG ICLDLAI 494 

CTGGATAGTGCCTTCCTGCTGTCCCAGGTGGCCCCATCCCTGTTTATGGGCTCCATTGTC 
LDSAFLLSQVAPSLFMGSIV 514 

CAGCTCAGCCAGTCTGTCACTGCCTATATGGTGTCTGCCGCAGGCCTGGGTCTGGTCGCC 
QLSQSVTAYMVSAAGLGLVA 534 

ATTTACTTTGCTACACAGGTAGTATTTGACAAGAGCGACTTGGCCAAATACTCAGCGTAG 
IYFATQVVFDKSDLAKYSA* 554 



GTCGAG 



08 DEC HIM 



10/517 



WO 03/104272 



PCT/EP03/06096 



22/45 



FIG. 13. Sequence of the CPC-P501S expression cassette of JNW735 (SEQ ID NO:18 & 44) 

Nucleotide sequence: SEQ ID NO. 18 
Polypeptide sequence: SEQ ID NO.44 



GACAAGTTTGAGAAAATCAATGGCACTTGGTACTACTTTGACAGTTCAGGCTATATGCTT 
DKFEKINGTWYYFDSSGYML 34 

GCAGACCGCTGGAGGAAGCACACAGACGGCAACTGGTACTGGTTCGACAACTCAGGCGAA 
ADRWRKHTDGN WYWFDNSGE 54 

ATGGCTAC AGG CTGGAAG AAAATCG CTGATAAGTGGTACTATTTCAACG AAG AAGGTGC C 
MATGWKKIADKWYYFNEEGA 74 

ATGAAGACAGGCTGGGTCAAGTACAAGGACACTTGGTACTACTTAGACGCTAAAGAAGGC 
MKTGWVKYKDTWYYLDAKEG 94 

GCCATG CAATACATCAAGGCTAACTCTAAGTTCATTGGTATCACTGA AGGCGTCATGGTA 
AM|QYI KANS KF IGITE|GVMV 114 

TCAAATGCCTTTATCCAGTCAGCGGACGGAACAGGCTGGTACTACCTCAAACCAGACGGA 
SNAFIQSADGTGWYYIiKPDG 134 

ACACTGGCAGACAGGCCAGAAAAGTTCATGTACATGGTGCTGGGCATTGGTCCAGTGCTG 

T L A D R P E KFMYMVLGIGPVL 154 

GGCCTGGTCTGTGTCCCGCTCCTAGGCTCAGCCAGTGACCACTGGCGTGGACGCTATGGC 
GLVCVPLLG SASDHWRGRYG 174 

CGCCGCCGGCCCTTCATCTGGGCACTGTCCTTGGGCATCCTGCTGAGCCTCTTTCTCATC 
RRRPFI WALSLGILLSLFL I 194 

CCAAGGGCCGGCTGGCTAGCAGGGCTGCTGTGCCCGGATCCCAGGCCCCTGGAGCTGGCA 
PRAGWLAGLLCPDPRPLELA 214 

CTGCTCATCCTGGGCGTGGGGCTGCTGGACTTCTGTGGCCAGGTGTGCTTCACTCCACTG 
LLILGVGLLDFCGQVCFTPL 234 

GAGGCCCTGCTCTCTGACCTCTTCCGGGACCCGGACCACTGTCGCCAGGCCTACTCTGTC 
EALLSDLFRDPDHCRQAYSV 254 

TATGCCTTCATGATCAGTCTTGGGGGCTGCCTGGGCTACCTCCTGCCTGCCATTGACTGG 
YAFMISLGGCLGYLLPAIDW 274 

GACACCAGTGCCCTGGCCCCCTACCTGGGCACCCAGGAGGAGTGCCTCTTTGGCCTGCTC 
DTSALAPYliGTQEECLFGLL 294 

ACCCTCATCTTCCTCACCTGCGTAGCAGCCACACTGCTGGTGGCTGAGGAGGCAGCGCTG 
TLIFLTCVAATLLVAEEAAL 314 
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GGCCCCACCGAGCCAGCAGAAGGGCTGTCGGCCCCCTCCTTGTCGCCCCACTGCTGTCCA 
GPTEPAEGLSAPSLSPHCCP 334 
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TGCCGGGCCCGCTTGGCTTTCCGGAACCTGGGCGCCCTGCTTCCCCGGCTGCACCAGCTG 
CRARLAFRNLGALLPRLHQL 354 

TGCTGCCGCATGCCCCGCACCCTGCGCCGGCTCTTCGTGGCTGAGCTGTGCAGCTGGATG 
CCRMPRTLRRLFVAELCSWM 374 

GCACTCATGACCTTCACGCTGTTTTACACGGATTTCGTGGGCGAGGGGCTGTACCAGGGC 
ALMTFTLFYTDFVGEGLYQG 394 

GTGCCCAGAGCTGAGCCGGGCACCGAGGCCCGGAGACACTATGATGAAGGCGTTCGGATG 
VPRAEPGTEARRHYDEGVRM 414 

GGCAGCCTGGGGCTGTTCCTGCAGTGCGCCATCTCCCTGGTCTTCTCTCTGGTCATGGAC 
GSLGLFLQCAISLVFSLVMD 434 

CGGCTGGTGCAGCGATTCGGCACTCGAGCAGTCTATTTGGCCAGTGTGGCAGCTTTCCCT 
RLVQR FGTRAVYLASVAAFP 454 

GTGGCTGCCGGTGCCACATGCCTGTCCCACAGTGTGGCCGTGGTGACAGCTTCAGCCGCC 
VAAGATCLSHSVAVVTASAA 474 

CTCACCGGGTTCACCTTCTCAGCCCTGCAGATCCTGCCCTACACACTGGCCTCCCTCTAC 
LTGFTFSALQILPYTLASLY 494 

C AC CGGG AG AAGCAGGTGTTCCTGC C C AAATACCG AGGGGAC ACTGG AGGTGCTAGC AGT 
HREKQVFLPKYRGDTGGASS 514 

GAGGACAGCCTGATGACCAGCTTCCTGCCAGGCCCTAAGCCTGGAGCTCCCTTCCCTAAT 
EDSLMTSFLPGPKPGAPFPN 534 

GGACACGTGGGTGCTGGAGGCAGTGGCCTGCTCCCACCTCCACCCGCGCTCTGCGGGGCC 
GHVGAGGSGLLPPPPALCGA 554 

TCTGCCTGTGATGTCTCCGTACGTGTGGTGGTGGGTGAGCCCACCGAGGCCAGGGTGGTT 
SACDVSVRVVVGEPTEARVV 574 

CCGGGCCGGGGCATCTGCCTGGACCTCGCCATCCTGGATAGTGCCTTCCTGCTGTCCCAG 
PGRGICLDLAILDSAFLLSQ 594 

GTGGCCCCATCCCTGTTTATGGGCTCCATTGTCCAGCTCAGCCAGTCTGTCACTGCCTAT 
VAPSLFMGS IVQLSQSVTAY 614 

ATGGTGTCTGCCGCAGGCCTGGGTCTGGTCGCCATTTACTTTGCTACACAGGTAGTATTT 
MVSAAGLGLVAIYFATQVVF 634 
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FIG. 14- Codon optimised P501S sequences (SEQ ID NO:19-20) 
SEQ IDNO:19 

ATGGTGCAGCGGCTCTGGGTGAGCCGCCTCCTGCGGCATCGCAAGGCCCAGCTCCTGCTGGTGAATCTGCTCA 
CATTCGGCCTGGAGGTGTGCCTGGCCGCCGGCATCACCTACGTGCCCCCCCTCCTGCTGGAGGTGGGAGTCGA 
GGAGAAGTTCATGACCATGGTGCTGGGCATTGGGCCCGTCCTGGGCCTCGTGTGCGTGCCTCTCCTCGGCAGC 
GCTTCCGACCATTGGCGCGGCCGGTATGGCCGCAGGAGACCCTTCATCTGGGCTCTGAGTCTCGGCATCCTGC 
TGAGCCTGTTCCTGATCCCTCGGGCCGGCTGGCTGGCCGGGCTGCTGTGCCCCGATCCTCGGCCCCTGGAGCT 
GGCCCTGCTGATCCTCGGCGTGGGCCTGCTGGACTTCTGCGGCCAGGTGTGCTTCACGCCCCTGGAGGCACTG 
CTGAGCGACCTGTTCCGGGACCCCGACCATTGCCGCCAGGCGTACAGCGTGTACGCCTTCATGATCTCCCTGG 
GAGGCTGCCTGGGCTACCTGCTCCCCGCCATCGATTGGGACACCAGCGCACTCGCCCCCTATCTCGGAACACA 
GGAGGAATGCCTGTTCGGATTGTTGACGCTCATCTTCCTCACGTGCGTCGCGGCCACCCTGTTGGTGGCCGAG 
GAGGCCGCCCTGGGGCCCACCGAGCCGGCCGAGGGACTGAGCGCCCCGAGCCTGAGTCCACACTGCTGCCCTT 
GCCGGGCCCGCCTGGCCTTCCGTAATCTGGGCGCCCTCCTGCCTCGGCTCCATCAGCTGTGTTGCAGAATGCC 
TAGGACGCTGCGGCGCCTGTTCGTCGCTGAGTTGTGCTCCTGGATGGCTCTCATGACCTTCACCCTGTTTTAT 
ACGGACTTCGTCGGGGAGGGCCTGTACCAGGGGGTGCCGCGCGCCGAGCCCGGGACAGAGGCGCGCCGCCACT 
ACGACGAGGGAGTGCGTATGGGCTCCCTGGGCCTCTTCTTGCAGTGCGCCATCAGTCTGGTTTTCTCTCTGGT 
CATGGACAGGCTGGTGCAGCGCTTCGGAACCCGGGCGGTGTACCTGGCGAGCGTGGCCGCCTTCCCCGTGGCT 
GCCGGCGCCACCTGCCTCTCTCACTCGGTGGCCGTGGTCACCGCCAGCGCCGCCCTGACCGGGTTCACCTTCT 
CTGCCCTGCAGATTCTGCCTTACACCCTGGCCAGCCTGTACCATCGCGAGAAACAGGTGTTTCTCCCCAAGTA 
CAGAGGCGACACCGGGGGCGCCTCCAGCGAGGACAGCCTCATGACCTCCTTCCTGCCTGGCCCCAAGCCCGGC 
GCCCCTTTCCCCAACGGGCACGTGGGCGCCGGCGGGAGTGGGCTCCTGCCCCCCCCTCCTGCGCTGTGCGGGG 
CCAGCGCCTGCGACGTGAGCGTGCGCGTGGTGGTGGGCGAGCCCACCGAGGCCCGCGTGGTGCCGGGCAGAGG 
CATTTGTCTGGACCTGGCCATCCTCGACTCCGCCTTCCTCCTCAGCCAGGTGGCCCCGTCCCTCTTCATGGGC 
TCTATCGTCCAGCTGTCTCAGAGCGTCACCGCTTACATGGTGTCCGCTGCTGGACTGGGCTTGGTGGCTATTT 
ATTTCGCCACCCAGGTGGTGTTCGACAAGAGCGACCTGGCCAAATACTCCGCCTGA 

SEQ ID NO:20 

ATGGTGCAGCGGCTGTGGGTGTCCCGGCTGCTGCGCCATAGAAAGGCCCAGTTGCTGCTGGTGAACCTGCTGA 
CTTTCGG ACTGG AGGTGTG CCTGG CTG CCGGG ATC ACGT ACGTGC CCC CCCTGCTGCTGG AGGTGGG CGTGG A 
GGAGAAGTTCATGACAATGGTGCTGGGCATCGGCCCCGTCCTGGGCCTCGTGTGTGTGCCCCTCCTCGGGAGT 
GCGTCCGATCATTGGCGGGGCCGCTACGGCCGCCGCAGACCGTTCATCTGGGCCCTGAGCCTGGGGATCCTGC 
TCTCTCTCTTCCTGATCCCCCGGGCCGGCTGGCTGGCCGGCCTGCTGTGTCCCGACCCCCGCCCTCTGGAGCT 
GGCCCTCCTGATCCTGGGCGTGGGCTTGTTGGACTTCTGCGGCCAGGTGTGTTTCACTCCCCTGGAGGCTCTG 
CTCTCCGACCTCTTCCGCGACCCCGACCACTGTAGGCAGGCTTACAGCGTGTACGCCTTCATGATCAGTCTGG 
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GGGGATGCCTGGGCTATCTGCTGCCCGCTATCGACTGGGACACCAGCGCCCTGGCCCCCTACCTGGGGACTCA 
GGAGGAGTGCCTGTTCGGCCTGCTCACCTTGATCTTCCTGACGTGCGTCGCCGCCACCCTGCTGGTGGCCGAG 
GAGGCGGCCCTGGGGCCCACCGAGCCCGCCGAGGGCCTGAGCGCTCCCAGCCTGAGCCCCCATTGCTGCCCGT 
GCAGGGCTAGGCTCGCCTTCAGGAATCTGGGCGCTTTGCTGCCCCGCCTGCATCAGCTGTGCTGTCGCATGCC 
TCGCACCCTGCGCCGCCTGTTCGTCGCTGAGCTCTGTTCCTGGATGGCCCTGATGACGTTCACCCTCTTCTAC 
ACCGACTTCGTGGGGGAGGGCCTGTACCAGGGCGTGCCCAGGGCCGAGCCCGGCACCGAGGCTAGGCGCCATT 
ACGACGAGGGCGTCAGGATGGGCTCTCTGGGCCTCTTCCTGCAGTGCGCCATCAGTCTGGTGTTCTCTCTGGT 
GATGGACCGGCTGGTGCAGCGCTTCGGCACCCGGGCCGTGTACCTCGCCTCTGTGGCGGCTTTCCCCGTCGCC 
GCCGGCGCGACCTGCCTGTCTCATTCTGTCGCCGTGGTGACCGCCAGCGCCGCCCTGACCGGCTTCACCTTCA 
GTGCGCTCCAGATTCTGCCCTACACCCTGGCGTCTCTGTACCATCGCGAGAAGCAGGTGTTCCTGCCCAAGTA 
CCGCGGGGACACAGGGGGAGCTTCCTCTGAGGACAGCCTGATGACCAGCTTCTTGCCCGGCCCCAAGCCGGGG 
GCCCCTTTCCCCAACGGCCATGTCGGGGCGGGCGGCAGCGGCCTGCTCCCTCCCCCCCCCGCCCTGTGCGGCG 
CTAGTGCCTGCGACGTGAGCGTGCGGGTGGTGGTGGGGGAGCCCACCGAGGCTAGGGTCGTGCCTGGCCGGGG 
GATCTGCCTGGACCTGGCCATCCTCGACTCCGCCTTCCTGCTCTCCCAGGTGGCGCCCAGCCTGTTCATGGGC 
AGTATCGTGCAGCTGAGCCAGAGCGTGACCGCCTACATGGTGAGCGCCGCCGGCCTGGGGTTGGTGGCCATCT 
ACTTTGCCACCCAGGTCGTGTTCGACAAGAGCGATCTCGCCAAGTATAGCGCCTGA 
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FIG. 15 - Re-engineered codon optimised sequence 19 (SEQ ID NO:21) 

GACG GCTAGC GCCACCATGGTGCAGCGGCTCTGGGTGAGCCGCCTCCTGCGGCATCGCAAGGCCCAGCTCCTG 
CTGGTGAATCTGCTCACATTCGGCCTGGAGGTGTGCCTGGCCGCCGGCATCACCTACGTGCCCCCCCTCCTGC 
TGGAGGTGGGAGTCGAGGAGAAGTTCATGACCATGGTGCTGGGCATTGGGCCCGTCCTGGGCCTCGTGTGCGT 
GCCTCTCCTCGGCAGCGCTTCCGACCATTGGCGCGGCCGGTATGGCCGCAGGAGACCCTTCATCTGGGCTCTG 
AGTCTCGGCATCCTGCTGAGCCTGTTCCTGATCCCTCGGGCCGGCTGGCTGGCCGGGCTGCTGTGCCCCGATC 
CTCGGCCCCTGGAGCTGGCCCTGCTGATCCTCGGCGTGGGCCTGCTGGACTTCTGCGGCCAGGTGTGCTTCAC 

gcccctggaggcactgctgagcgacctgttccgggaccccgaccattgccgccaggcgtacagcgtgtacgcc 
ttcatgatctccctgggaggctgcctgggctacctgctccccgccatcgattgggacaccagcgcactcgccc 
cctatctcggaacacaggaggaatgcctgttcggaQtgQtgacgctcatcttcctcacgtgcgtcgcggccac 
cctgttggtggccgaggaggccgccctggggcccaccgagccggccgagggactgagcgccccgagcctgagt 
ccacactgctgcccttgccgggcccgcctggccttccgtaatctgggcgccctcctgcctcggctccatcagc 

TGTGTTGCAGAATGCCTAGGACGCTGCGGCGCCTGTTCGTCGCTGAGTTGTGCTCCTGGATGGCTCTCATGAC 
CTTCACCCTGTTTTATACGGACTTCGTCGGGGAGGGCCTGTACCAGGGGGTGCCGCGCGCCGAGCCCGGGACA 

TGGTTTTCTCTCTGGTCATGGACAGGCTGGTGCAGCGCTTCGGAACCCGGGCGGTGTACCTGGCGAGCGTGGC 
CGCCTTCCCCGTGGCTGCCGGCGCCACCTGCCTCTCTCACTCGGTGGCCGTGGTCACCGCCAGCGCCGCCCTG 
ACCGGGTTCACCTTCTCTGCCCTGCAGATTCTGCCTTACACCCTGGCCAGCCTGTACCATCGCGAGAAACAGG 
TGTTTCTCCCCAAGTACAGAGGCGACACCGGGGGCGCCTCCAGCGAGGACAGCCTCATGACCTCCTTCCTGCC 
TGGCCCCAAGCCCGGCGCCCCTTTCCCCAACGGGCACGTGGGCGCCGGCGGGAGTGGGCTCCTGCCCCCCCCT 
CCTGCGCTGTGCGGGGCCAGCGCCTGCGACGTGAGCGTGCGCGTGGTGGTGGGCGAGCCCACCGAGGCCCGCG 
TGGTGCCGGGCAGAGGCATTTGTCTGGACCTGGCCATCCTCGACTCCGCCTTCCTCCTCAGCCAGGTGGCCCC 
GTCCCTCTTCATGGGCTCTATCGTCCAGCTGTCTCAGAGCGTCACCGCTTACATGGTGTCCGCTGCTGGACTG 
GGCTTGGTGGCTATTTATTTCGCCACCCAGGTGGTGTTCGACAAGAGCGACCTGGCCAAATACTCCGCCTGAC 
TCGAGGCAG 
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FIG. 16 - Re-engineered codon optimised sequence 20 (SEQ ID NO:22) 

GACG GCTAGC GCCACCATGGTGCAGCGGCTGTGGGTGTCCCGGCTGCTGCGCCATAGAAAGGCCCAGTTGCTG 
CTGGTGAACCTGCTGACTTTCGGACTGGAGGTGTGCCTGGCTGCCGGGATCACGTACGTGCCCCCCCTGCTGC 
TGGAGGTGGGCGTGGAGGAGAAGTTCATGACAATGGTGCTGGGCATCGGCCCCGTCCTGGGCCTCGTGTGTGT 
GCCCCTCCTCGGGAGTGCGTCCGATCATTGGCGGGGCCGCTACGGCCGCCGCAGACCGTTCATCTGGGCCCTG 
AGCCTGGGCATCCTGCTCTCTCTCTTCCTGATCCCCCGGGCCGGCTGGCTGGCCGGCCTGCTGTGTCCCGACC 
CCCGCCCTCTGGAGCTGGCCCTCCTGATCCTGGGCGTGGGCQrGgrGGACTTCTGCGGCCAGGTGTGTTTCAC 
TCCCCTGGAGGCTCTGCTCTCCGACCTCTTCCGCGACCCCGACCACTGTAGGCAGGCTTACAGCGTGTACGCC 
TTCATGATCAGTCTGGGGGGATGCCTGGGCTATCTGCTGCCCGCTATCGACTGGGACACCAGCGCCCTGGCCC 
CCTACCTGGGGACTCAGGAGGAGTGCCTGTTCGGCCTGCTCACCTTGATCTTCCTGACGTGCGTCGCCGCCAC 
CCTGCTGGTGGCCGAGGAGGCGGCCCTGGGGCCCACCGAGCCCGCCGAGGGCCTGAGCGCTCCCAGCCTGAGC 
CCCCATTGCTGCCCGTGCAGGGCTAGGCTCGCCTTCAGGAATCTGGGCGCTTTGCTGCCCCGCCTGCATCAGC 
TGTGCTGTCGCATGCCTCGCACCCTGCGCCGCCTGTTCGTCGCTGAGCTCTGTTCCTGGATGGCCCTGATGAC 
GTTCACCCTCTTCTACACCGACTTCGTGGGGGAGGGCCTGTACCAGGGCGTGCCCAGGGCCGAGCCCGGCACC 
GAGGCTAGGCGCCATTACGACGAGGGCGTCAGGATGGGCTCTCTGGGCCTCTTCCTGCAGTGCGCCATCAGTC 
TGGTGTTCTCTCTGGTGATGGACCGGCTGGTGCAGCGCTTCGGCACCCGGGCCGTGTACCTCGCCTCTGTGGC 
GGCTTTCCCCGTCGCCGCCGGCGCGACCTGCCTGTCTCATTCTGTCGCCGTGGTGACCGCCAGCGCCGCCCTG 
ACCGGCTTCACCTTCAGTGCGCTCCAGATTCTGCCCTACACCCTGGCGTCTCTGTACCATCGCGAGAAGCAGG 
TGTTCCTGCCCAAGTACCGCGGGGACACAGGGGGAGCTTCCTCTGAGGACAGCCTGATGACCAGCTTCTTGCC 
CGGCCCCAAGCCGGGGGCCCCTTTCCCCAACGGCCATGTCGGGGCGGGCGGCAGCGGCCTGCTCCCTCCCCCC 
CCCGCCCTGTGCGGCGCTAGTGCCTGCGACGTGAGCGTGCGGGTGGTGGTGGGGGAGCCCACCGAGGCTAGGG 
TCGTGCCTGGCCGGGGGATCTGCCTGGACCTGGCCATCCTCGACTCCGCCTTCCTGCTCTCCCAGGTGGCGCC 
CAGCCTGTTCATGGGCAGTATCGTGCAGCTGAGCCAGAGCGTGACCGCCTACATGGTGAGCGCCGCCGGCCTG 
GGGTTGGTGGCCATCTACTTTGCCACCCAGGTCGTGTTCGACAAGAGCGATCTCGCCAAGTATAGCGCCTGAC 
TCGAGGCAG 
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FIG. 17 - The starting sequence for the optimisation of CPC (SEQ ID NO:23) 

Four amino acids of P501S sequence are boxed. 

ATGGCGGCCGCTTACGTACATTCCGACGGCTCTTATCCAAAAGACAAGTTTGAGAAAATCAATGGCACTTGGT 
ACT ACTTTG ACAGTT CAGGCT ATATGCTTGCAG AC CGCTGG AGG AAGC AC ACAGACGG CAACTGGT ACTGGTT 
CGACAACTCAGGCGAAATGGCTACAGGCTGGAAGAAAATCGCTGATAAGTGGTACTATTTCAACGAAGAAGGT 
G CC ATG AAGAC AGGCTGGGTCAAGTAC AAGGAC ACTTGGTACT ACTTAG ACGCTAAAGAAGGCG C CATGCAAT 
ACATCAAGGCTAACTCTAAGTTCATTGGTATCACTGAAGGCGTCATGGTATCAAATGCCTTTATCCAGTCAGC 
nnArnnAArAGGCTGGTACTACCTCAAACCAGACGGAACACTGGCAGACAGGCCAGAA lAAGTTCATGTAq 



FIG. 18 - Representative codon optimised CPC sequences (SEQ ID NO:24-25) 
SEQ ID NO:24 

ATGG CCGC CGC CTACGTGCAT AGCG ACGGG AG CTACCCCAAGGAC AAG TTCG AGAAGATC AACGGG AC ATGGT 
ACTACTTCGACTCCTCCGGCTACATGCTCGCCGACCGCTGGCGGAAGCACACCGACGGCAACTGGTACTGGTT 
CGATAACTCGG G AGAG ATGGC C ACCGGCTGG AAGAAGATCG CGG AC AAGTGGTACTATTTCAACG AGG AGGGC 
G CCATGAAGACCGGCTGGGTGAAGT AT AAGGAC ACCTGGTACTACCT CG ACGCCAAGGAGGGCG CCATG CAGT 
ATATCAAGG CC AACAGC AAGTTC ATCGG CATC ACCG AGGG AGTGATGGTC AGC AACGCCTTTATCCAGAGCG C 
CGACGGCACCGGATGGTACTACTTGAAGCCGGACGGCACCCTCGCGGATCGGCCCGAGAAGTTCATGTAC 

SEQ ID NO:25 

ATGG C CGCCGC CTACGTGCACAGCG ACGGGTC CTAC CCAAAGGACAAGTTCG AGAAGATC AACGGCACGTGGT 
ACTATTTCGACAGCAGCGGCTACATGCTCGCCGATCGCTGGCGCAAGCACACCGACGGGAACTGGTACTGGTT 
CGACAACTCTGGCGAGATGGCTACGGGGTGGAAGAAGATCGCCGACAAGTGGTACTACTTCAACGAGGAGGGC 
G C CATG AAGACCGGGTGGGTGAAGT ACAAGG AC ACCTGGTACTACCTGG ACGCT AAGGAGGGCG C CATG CAGT 
ACATCAAGG CC AACTCGAAGTTCATCGGG ATCACCG AGGGCGTG ATGGTCAGTAACGC TTTCATC CAG AG CGC 
GG ACGGCACAGG CTGGTATT ACCTGAAGCCCGATGGC ACCCTGGCGGAC AG AC CTG AG AAATTCATGT AC 



FIG. 19 - Engineered CPC codon optimised sequence (SEQ ID NO:26) 
SEQ ID NO:26 

GACGGCTAGCGCCACCATGGCCGCCGCCTACGTGCATAGCGACGGGAGCTACCCCAAGGACAAGTTCGAGAAG 
ATCAACGGGACATGGTACT AC TTCG ACTCCTCCGGCTAC ATG CTCGCCGACCGCTGGCGG AAGC AC ACCG ACG 
G CAACTGGT ACTGGTTCG AT AACTCGGG AG AG ATGG CC ACCGGCTGGAAGAAG ATCG CGG ACAAGTGGTACTA 
TTTCAACGAGGAGGGCG CCATG AAGACCGGCTGGGTGAAGTATAAGGACACCTGGT ACT ACCTCGACGCCAAG 
GAGGGCG CC ATGCAGT AT ATCAAGGCC AACAGC AAGTTC ATCGGCATC AC CGAGGGAGTGATGGTCAGCAACG 
CCTTTATCCAGAGCGCCGACGGCACCGGATGGTACTACTTGAAGCCGGACGGCACCCTCGCGGATCGGCCCGA 
G |AAGTTCATGTAC| rG ACTCGAG GCAG 
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FIG. 20 - P501S CPC fusion candidate constructs and sequences 



A 
B 

C 
D 



CPC 



CPC 



CPC 



N term 
P501S 



P501S (AN term) 



P501S(AN term) 



P501S(AN term) 



N term 
P501S 



CPC 



P501S(AN term) 



Construct A = SEQ ID NO:37 (nucleotide) & 45 (polypeptide) 

GCGGCCGCGCCACCATGGCCGCCGCCTACGTGCATAGCGACGGGAGCTACCCCAAGGACA 
MAAAYVHSDG SYPKD K 

AGTTCGAGAAGATCAACGGGACATGGTACTACTTCGACTCCTCCGGCTACATGCTCGCCG 
FEKI NGTWYYFDSSGYMLAD 

ACCGCTGGCGGAAGCACACCGACGGCAACTGGTACTGGTTCGATAACTCGGGAGAGATGG 
RWRKHTDGNWYWFDNSGEMA 

CCACCGG CTGGAAGAAG ATCGCGG AC AAGTGGTACT ATTTC AACGAGG AGGG CGCCATG A 
TGWKKIADKWYYFNEEGAMK 

AGACCGGCTGGGTGAAGTATAAGGACACCTGGTACTACCTCGACGCCAAGGAGGGCGCCA 
TGWVKYKDTWYYLDAKEGAM 

TG C AGTATATC AAG GCCAACAGCAAGTTCATCGGC ATC ACCGAGGG AGTG ATGGTCAG C A 
QYIKANSKFIGITEGVMVSN 

ACGCCTTTATCCAGAGCGCCGACGGCACCGGATGGTACTACTTGAAGCCGGACGGCACCC 
AFIQSADGTGWYYLKPDGTL 

TCGCGGATCGGCCCGAGAAGTTCATGTACATGGTGCTGGGCATCGGCCCCGTCCTGGGCC 
ADRPEKFMYMVLG IGPVLGL 

TCGTGTGTGTGCCCCTCCTCGGGAGTGCGTCCGATCATTGGCGGGGCCGCTACGGCCGCC 
VCVPLLGSASDHWRGRYGRR 

GCAGACCGTTCATCTGGGCCCTGAGCCTGGGCATCCTGCTCTCTCTCTTCCTGATCCCCC 
RPFIWALSLGILLSLFLIPR 

GGGCCGGCTGGCTGGCCGGCCTGCTGTGTCCCGACCCCCGCCCTCTGGAGCTGGCCCTCC 
AGWLAGLLCPDPRPLELALL 



TGATCCTGGGCGTGGGCCTGCTGGACTTCTGCGGCCAGGTGTGTTTCACTCCCCTGGAGG 
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ILGVGLLDFCGQVCFTPLEA 

CTCTGCTCTCCGACCTCTTCCGCGACCCCGACCACTGTAGGCAGGCTTACAGCGTGTACG 
~ L L SDL FRDPDHCRQAYSVYA 

CCTTCATGATCAGTCTGGGGGGATGCCTGGGCTATCTGCTGCCCGCTATCGACTGGGACA 
~ F M ISLGGCLGYLLPAIDWDT 

CCAGCGCCCTGGCCCCCTACCTGGGGACTCAGGAGGAGTGCCTGTTCGGCCTGCTCACCT 
SALAPYLGTQEECLFGLLTL 

TGATCTTCCTGACGTGCGTCGCCGCCACCCTGCTGGTGGCCGAGGAGGCGGCCCTGGGGC 
IFLTCVAATLLVAEEAALGP 

CCACCGAGCCCGCCGAGGGCCTGAGCGCTCCCAGCCTGAGCCCCCATTGCTGCCCGTGCA 
TEPAEGLSAPSLSPHCCPCR 

GGGCTAGGCTCGCCTTCAGGAATCTGGGCGCTTTGCTGCCCCGCCTGCATCAGCTGTGCT 
ARLAFRNLGALLPRLHQLCC 

GTCG CATGCCTCGC AC CCTGCGCCGCCTGTTCGTCGCTG AG CTCTGTTC CTGGATGG CCC 
RMPRTLRRLFVAELCSWMAL 

TGATGACGTTCACCCTCTTCTACACCGACTTCGTGGGGGAGGGCCTGTACCAGGGCGTGC 
MTFTLFYTDFVGEGLYQGVP 

CCAGGGCCGAGCCCGGCACCGAGGCTAGGCGCCATTACGACGAGGGCGTCAGGATGGGCT 
RAEPGTEARRHYDEGVRMGS 

CTCTGGGCCTCTTCCTGCAGTGCGCCATCAGTCTGGTGTTCTCTCTGGTGATGGACCGGC 
LG LFLQCAI SLVFSLVMDRL 

TGGTGCAGCGCTTCGGCACCCGGGCCGTGTACCTCGCCTCTGTGGCGGCTTTCCCCGTCG 
VQRFGTRAVYLASVAAFPVA 

CCGCCGGCGCGACCTGCCTGTCTCATTCTGTCGCCGTGGTGACCGCCAGCGCCGCCCTGA 
AGATCLSHSVAVVTASAALT 

CCGGCTTCACCTTCAGTGCGCTCCAGATTCTGCCCTACACCCTGGCGTCTCTGTACCATC 
GFTFSALQILPYTLASLYHR 

GCGAGAAGCAGGTGTTCCTGCCCAAGTACCGCGGGGACACAGGGGGAGCTTCCTCTGAGG 
EKQVFLPKYRGDTGGASSED 

ACAGCCTGATGACCAGCTTCTTGCCCGGCCCCAAGCCGGGGGCCCCTTTCCCCAACGGCC 
SLMTS FLPGPKPGAPFPNGH 

ATGTCGGGGCGGGCGGCAGCGGCCTGCTCCCTCCCCCCCCCGCCCTGTGCGGCGCTAGTG 
VGAGG SGLLP PPPALCGASA 

CCTGCGACGTGAGCGTGCGGGTGGTGGTGGGGGAGCCCACCGAGGCTAGGGTCGTGCCTG 
CDVSVRVVVGEPTEARVVPG 

GCCGGGGGATCTGCCTGGACCTGGCCATCCTCGACTCCGCCTTCCTGCTCTCCCAGGTGG 
RGICLDLAILDSAFLLSQVA 

CGCCCAGCCTGTTCATGGGCAGTATCGTGCAGCTGAGCCAGAGCGTGACCGCCTACATGG 
PSLFMGSIVQLSQSVTAYMV 
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TGAGCGCCGCCGGCCTGGGGTTGGTGGCCATCTACTTTGCCACCCAGGTCGTGTTCGACA 
SAAGLGLVA IYFATQVVFDK 

AG AG CG ATCT CGC C AAGTATAGCGC CTG AGG ATCC 
SDLAKYSA* 



Construct B = SEQ ID NO:38 (nucleotide) & 46 (polypeptide) 

GCGGCCGCGCCACCATGGCCGCCGCCTACGTGCATAGCGACGGGAGCTACCCCAAGGACA 
MAAAYVHSDGSYPKDK 

AGTTCGAGAAGATCAACGGGACATGGTACTACTTCGACTCCTCCGGCTACATGCTCGCCG 
FEKINGTWYYFDSSGYMLAD 

ACCG CTGG CGGAAGC AC ACCG ACGGCAACTGGTACTGGTTCG ATAACT CGGG AGAGATGG 
RWRKHTDGNWYWFDNSGEMA 

CCACCGGCTGGAAGAAGATCGCGGACAAGTGGTACTATTTCAACGAGGAGGGCGCCATGA 
TGWKKIADKWYYFNEEGAMK 

AG AC CGG CTGGGTG AAGTAT AAGG AC AC CTGGTACTACCTCGACGC CAAGG AGGGCGC C A 
TGWVKYKDTWYYLDAKEGAM 

TGCAGTAT ATC AAGG C C AAC AG CAAGTTC ATCGGCATC ACCGAGGG AG TG ATGGTCAG CA 
QYIKANSKFIGITEGVMVSN 

ACGCCTTTATCCAGAGCGCCGACGGCACCGGATGGTACTACTTGAAGCCGGACGGCACCC 
AFIQSADGTGWYYLKPDGTL 

TCGCGGATCGGCCCGAGATGGTGCAGCGGCTGTGGGTGTCCCGGCTGCTGCGCCATAGAA 
ADRPEMVQRLWVSRLLRHRK 

AGGCCCAGTTGCTGCTGGTGAACCTGCTGACTTTCGGACTGGAGGTGTGCCTGGCTGCCG 
AQLLLVNLLTFGLEVCLAAG 

GGATCACGTACGTGCCCCCCCTGCTGCTGGAGGTGGGCGTGGAGGAGAAGTTCATGACAA 
ITYVPPLLLEVGVEEKFMTM 

TGGTGCTGGGCATCGGCCCCGTCCTGGGCCTCGTGTGTGTGCCCCTCCTCGGGAGTGCGT 
VLGIGPVLGLVCVPLLGSAS 

CCGATCATTGGCGGGGCCGCTACGGCCGCCGCAGACCGTTCATCTGGGCCCTGAGCCTGG 
DHWRGRYGRRRPFIWALSLG 

GCATCCTGCTCTCTCTCTTCCTGATCCCCCGGGCCGGCTGGCTGGCCGGCCTGCTGTGTC 
ILLSLFLIPRAGWLAGLLCP 

CCGACCCCCGCCCTCTGGAGCTGGCCCTCCTGATCCTGGGCGTGGGCCTGCTGGACTTCT 
DPRPLELALLILGVGLLDFC 

GCGGCCAGGTGTGTTTCACTCCCCTGGAGGCTCTGCTCTCCGACCTCTTCCGCGACCCCG 
GQVCFTPLEALLSDLFRDPD 

ACCACTGTAGGCAGGCTTACAGCGTGTACGCCTTCATGATCAGTCTGGGGGGATGCCTGG 
HCRQAYSVYAFMI SLGGCLG 

GCTATCTGCTGCCCGCTATCGACTGGGACACCAGCGCCCTGGCCCCCTACCTGGGGACTC 
YLLPAIDWDTSALAPYLGTQ 
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AGGAGGAGTGCCTGTTCGGCCTGCTCACCTTGATCTTCCTGACGTGCGTCGCCGCCACCC 
EECLFGLLTLIFLTCVAATL 

TGCTGGTGGCCGAGGAGGCGGCCCTGGGGCCCACCGAGCCCGCCGAGGGCCTGAGCGCTC 
LVAEEAALGPTEPAEGLSAP 

CCAGCCTGAGCCCCCATTGCTGCCCGTGCAGGGCTAGGCTCGCCTTCAGGAATCTGGGCG 
SLSPHCCPCRARLAFRNLGA 

CTTTGCTGCCCCGCCTGCATCAGCTGTGCTGTCGCATGCCTCGCACCCTGCGCCGCCTGT 
LLPRLHQLCCRMPRTLRRLF 

TCGTCGCTGAGCTCTGTTCCTGGATGGCCCTGATGACGTTCACCCTCTTCTACACCGACT 
VAELCSWMALMTFTLFYTDF 

TCGTGGGGGAGGGCCTGTACCAGGGCGTGCCCAGGGCCGAGCCCGGCACCGAGGCTAGGC 
VGEGLYQGVPRAEPGTEARR 

GCCATTACGACGAGGGCGTCAGGATGGGCTCTCTGGGCCTCTTCCTGCAGTGCGCCATCA 
HYDEGVRMGSLGL FLQCAIS 

GTCTGGTGTTCTCTCTGGTGATGGACCGGCTGGTGCAGCGCTTCGGCACCCGGGCCGTGT 
LVF S LVMDRLVQRFGTRAVY 

ACCTCG CCTCTGTGG CGG CTTTCCCCGTCG CCG CCGGCGCG ACCTGCCTGTCTC ATTCTG 
LASVAAFPVAAGATCLSHSV 

TCGCCGTGGTGACCGCCAGCGCCGCCCTGACCGGCTTCACCTTCAGTGCGCTCCAGATTC 
AV.VTAS AALTG FT F S ALQ I L 

TGCCCTACACCCTGGCGTCTCTGTACCATCGCGAGAAGCAGGTGTTCCTGCCCAAGTACC 
PYTLASLYHRBKQVFLPKYR 

GCGGGGACACAGGGGGAGCTTCCTCTGAGGACAGCCTGATGACCAGCTTCTTGCCCGGCC 
GDTGGAS SEDS LMTSFLPGP 

CCAAGCCGGGGGCCCCTTTCCCCAACGGCCATGTCGGGGCGGGCGGCAGCGGCCTGCTCC 
KPGAPFPNGHVGAGGSGLLP 

CTCCCCCCCCCGCCCTGTGCGGCGCTAGTGCCTGCGACGTGAGCGTGCGGGTGGTGGTGG 
PPPAIiCGASACDVSVRVVVG 

GGGAGCCCACCGAGGCTAGGGTCGTGCCTGGCCGGGGGATCTGCCTGGACCTGGCCATCC 
EPTEARVVPGRGICLDLAIL 

TCGACTCCGCCTTCCTGCTCTCCCAGGTGGCGCCCAGCCTGTTCATGGGCAGTATCGTGC 
DSAFLLSQVAPSLFMGSIVQ 

AGCTGAGCCAGAGCGTGACCGCCTACATGGTGAGCGCCGCCGGCCTGGGGTTGGTGGCCA 
LSQSVTAYMVSAAGLGLVAI 

TCTACTTTGCCACCCAGGTCGTGTTCGACAAGAGCGATCTCGCCAAGTATAGCGCCTGAG 
YFATQVVFDKSDLAKYSA* 
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Construct C = SEQ ID NO:39 (nucleotide) & 47 (polypeptide) 
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GCGGCCGCGCCACCATGGCCGCCGCCTACGTGCATAGCGACGGGAGCTACCCCAAGGACA 
MAAAYVHSDGSYPKDK 

AGTTCGAGAAGATCAACGGGACATGGTACTACTTCGACTCCTCCGGCTACATGCTCGCCG 
FEKINGTWYYFDSSGYMLAD 

ACCGCTGGCGGAAGCACACCGACGGCAACTGGTACTGGTTCGATAACTCGGGAGAGATGG 
RWRKHTDGNWYWFDNSGEMA 

CCACCGGCTGGAAGAAGATCGCGGACAAGTGGTACTATTTCAACGAGGAGGGCGCCATGA 
TGWKKIADKWYYFNEEGAMK 

AG ACCGG CTGGGTG AAGT ATAAG GACACCTGGTACTAC CTCG ACGCCAAGGAGGGCGC CA 
TGWVKYKDTWYYLDAKEGAM 

TGCAGT ATATCAAGGC C AAC AGC AAGTTC ATCGGC ATC ACCGAGGGAGTG ATGGTCAG C A 
QYIKANSKFIGITEGVMVSN 

ACGCCTTTATCCAGAGCGCCGACGGCACCGGATGGTACTACTTGAAGCCGGACGGCACCC 
AFIQSADGTGWYYLKPDGTL 

TCGCGGATCGGCCCGAGAAGTTCATGTACATGGTGCTGGGCATCGGCCCCGTCCTGGGCC 
ADRPEKFMYMVLGIGPVLGL 

TCGTGTGTGTGCCCCTCCTCGGGAGTGCGTCCGATCATTGGCGGGGCCGCTACGGCCGCC 
VCVPLLG SASDHWRGRYGRR 

GCAGACCGTTCATCTGGGCCCTGAGCCTGGGCATCCTGCTCTCTCTCTTCCTGATCCCCC 
RPFIWALSLGILLSLFLIPR 

GGGCCGGCTGGCTGGCCGGCCTGCTGTGTCCCGACCCCCGCCCTCTGGAGCTGGCCCTCC 
AGWLAGLLCPDPRPLELALL 

TGATCCTGGGCGTGGGCCTGCTGGACTTCTGCGGCCAGGTGTGTTTCACTCCCCTGGAGG 
ILGVGLLDFCGQVCFTPLEA 

CTCTGCTCTCCGACCTCTTCCGCGACCCCGACCACTGTAGGCAGGCTTACAGCGTGTACG 
LLSDLFRDPDHCRQAYSVYA 

CCTTCATGATCAGTCTGGGGGGATGCCTGGGCTATCTGCTGCCCGCTATCGACTGGGACA 
FMISLGG CLGYLLPAIDWDT 

CCAGCGCCCTGGCCCCCTACCTGGGGACTCAGGAGGAGTGCCTGTTCGGCCTGCTCACCT 
SALAPYLGTQEECLFGLLTL 

TGATCTTCCTGACGTGCGTCGCCGCCACCCTGCTGGTGGCCGAGGAGGCGGCCCTGGGGC 
IFLTCVAATLLVAEEAALGP 

CCACCGAGCCCGCCGAGGGCCTGAGCGCTCCCAGCCTGAGCCCCCATTGCTGCCCGTGCA 
TEPAEGLSAPSLSPHCCPCR 

GGGCTAGGCTCGCCTTCAGGAATCTGGGCGCTTTGCTGCCCCGCCTGCATCAGCTGTGCT 
ARLAFRNLGALLPRLHQLCC 

GTCGCATGCCTCGCACCCTGCGCCGCCTGTTCGTCGCTGAGCTCTGTTCCTGGATGGCCC 
RMPRTLRRLFVAELCSWMAL 

TGATGACGTTCACCCTCTTCTACACCGACTTCGTGGGGGAGGGCCTGTACCAGGGCGTGC 
MTFTLFYTDFVGEGLYQGVP 
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rrAGGGCCGAGCCCGGCACCGAGGCTAGGCGCCATTACGACGAGGGCGTCAGGATGGGCT 
RAEPGTEARRHYDEGVRMGS 

CTCTGGGCCTCTTCCTGCAGTGCGCCATCAGTCTGGTGTTCTCTCTGGTGATGGACCGGC 
LGLFLQCAISLVFSLVMDRL 

TGGTGCAGCGCTTCGGCACCCGGGCCGTGTACCTCGCCTCTGTGGCGGCTTTCCCCGTCG 
VQRFGTRAVYLASVAAFPVA 

CCGCCGGCGCGACCTGCCTGTCTCATTCTGTCGCCGTGGTGACCGCCAGCGCCGCCCTGA 
AGATCLSHSVAVVTASAALT 

CCGGCTTCACCTTCAGTGCGCTCCAGATTCTGCCCTACACCCTGGCGTCTCTGTACCATC 
GFTFSALQILPYTLASLYHR 

GCGAGAAGCAGGTGTTCCTGCCCAAGTACCGCGGGGACACAGGGGGAGCTTCCTCTGAGG 
EKQVFLPKYRGDTGGASSED 

ACAGCCTGATGACCAGCTTCTTGCCCGGCCCCAAGCCGGGGGCCCCTTTCCCCAACGGCC 
SLMTSFLPGPKPGAPFPNGH 

ATGTCGGGGCGGGCGGCAGCGGCCTGCTCCCTCCCCCCCCCGCCCTGTGCGGCGCTAGTG 
VGAGGSGLLPPPPALCGASA 

CCTGCGACGTGAGCGTGCGGGTGGTGGTGGGGGAGCCCACCGAGGCTAGGGTCGTGCCTG 
CDVSVRVVVGEPTEARVVPG 

GCCGGGGGATCTGCCTGGACCTGGCCATCCTCGACTCCGCCTTCCTGCTCTCCCAGGTGG 
RGICLDLAILDSAFLLSQVA 

CGCCCAGCCTGTTCATGGGCAGTATCGTGCAGCTGAGCCAGAGCGTGACCGCCTACATGG 
PSLFMGSIVQLSQSVTAYMV 

TGAGCGCCGCCGGCCTGGGGTTGGTGGCCATCTACTTTGCCACCCAGGTCGTGTTCGACA 
SAAGLGLVAIYFATQVVFDK 

AGAGCGATCTCGCCAAGTATAGCGCCATGGTGCAGCGGCTGTGGGTGTCCCGGCTGCTGC 
SDLAKYSAMVQRLWVSRLLR 

GCCATAGAAAGGCCCAGTTGCTGCTGGTGAACCTGCTGACTTTCGGACTGGAGGTGTGCC 
HRKAQLLLVNLLTFGLEVCL 

TGGCTGCCGGGATCACGTACGTGCCCCCCCTGCTGCTGGAGGTGGGCGTGGAGGAGTGAG 
AAGITYVPPLLLEVGVEE* 

GATCC 

Construct D = SEQ ID NO:40 (nucleotide) & 48 (polypeptide) 

GCGGCCGCGCCACCATGGTGCAGCGGCTGTGGGTGTCCCGGCTGCTGCGCCATAGAAAGG 
MVQRLWVSRLLRHRKA 

CCCAGTTGCTGCTGGTGAACCTGCTGACTTTCGGACTGGAGGTGTGCCTGGCTGCCGGGA 
QLLLVNLLTFGLEVCLAAGI 

TCACGTACGTGCCCCCCCTGCTGCTGGAGGTGGGCGTGGAGGAGATGGCCGCCGCCTACG 
TYVPPLLLEVGVEEMAAAYV 



Ps^'fl PPT"- 0 8 DEC 20M 



10/517420 



WO 03/1 04272 PCT/EP03/06096 



35/45 



TGCATAGCGACGGGAGCTACCCCAAGGACAAGTTCGAGAAGATCAACGGGACATGGTACT 
HSDGSYPKDKFEKINGTWYY 

ACTTCGACTCCTCCGGCTACATGCTCGCCGACCGCTGGCGGAAGCACACCGACGGCAACT 
FDSSGYMLADRWRKHTDGNW 

GGTACTGGTTCGATAACTCGGGAGAGATGGCCACCGGCTGGAAGAAGATCGCGGACAAGT 
YWFDNSGEMATGWKKIADKW 

GGTACTATTTCAACG AGG AGGGCG CCATG AAG AC CGGCTGGGTG AAGTATAAGG ACAC CT 
YYFNEEGAMKTGWVKYKDTW 

GGTACTACCTCGACGCCAAGGAGGGCGCCATGCAGTATATCAAGGCCAACAGCAAGTTCA 
YY LDAKEGAMQY I KANS KF I 

TCGGCATCACCGAGGGAGTGATGGTCAGCAACGCCTTTATCCAGAGCGCCGACGGCACCG 
GITEGVMVSNAFIQSADGTG 

GATGGTACTACTTGAAGCCGGACGGCACCCTCGCGGATCGGCCCGAGAAGTTCATGTACA 
WYYLKPDGTLADRPEKFMYM 

TGGTGCTGGGCATCGGCCCCGTCCTGGGCCTCGTGTGTGTGCCCCTCCTCGGGAGTGCGT 
VLGIGPVLGLVCVPLLGSAS 

CCGATCATTGGCGGGGCCGCTACGGCCGCCGCAGACCGTTCATCTGGGCCCTGAGCCTGG 
DHWRGRYGRRRPFIWALSLG 

GCATCCTGCTCTCTCTCTTCCTX3ATCCCCCGGGCCGGCTGGCTGGCCGGCCTGCTGTGTC 
ILLSLFLI PRAGWLAGLLCP 

CCGACCCCCGCCCTCTGGAGCTGGCCCTCCTGATCCTGGGCGTGGGCCTGCTGGACTTCT 
DPRPLELALLILGVGLLDFC 

GCGGCCAGGTGTGTTTCACTCCCCTGGAGGCTCTGCTCTCCGACCTCTTCCGCGACCCCG 
GQVCFTPLEALLSDLFRDPD 

ACCACTGT AGGCAGG CTTACAGCGTGT ACGCCTTCATG ATCAGTCTGGGGGG ATGC CTGG 
HCRQAYSVYAFMISLGGCLG 

GCTATCTGCTGCCCGCTATCGACTGGGACACCAGCGCCCTGGCCCCCTACCTGGGGACTC 
YLLPAIDWDTSALAPYLGTQ 

AGGAGGAGTGCCTGTTCGGCCTGCTCACCTTGATCTTCCTGACGTGCGTCGCCGCCACCC 
EECLFGLLTLIFLTCVAATL 

TGCTGGTGGCCGAGGAGGCGGCCCTGGGGCCCACCGAGCCCGCCGAGGGCCTGAGCGCTC 
LVAEEAALGPTEPAEGLSAP 

CCAGCCTGAGCCCCCATTGCTGCCCGTGCAGGGCTAGGCTCGCCTTCAGGAATCTGGGCG 
SLSPHCCPCRARLAFRNLGA 

CTTTGCTGCCCCGCCTGCATCAGCTGTGCTGTCGCATGCCTCGCACCCTGCGCCGCCTGT 
LLPRLHQLCCRMPRTLRRLF 

TCGTCGCTGAGCTCTGTTCCTGGATGGCCCTGATGACGTTCACCCTCTTCTACACCGACT 
VAELCSWMALMTFTLFYTDF 



TCGTGGGGGAGGGCCTGTACCAGGGCGTGCCCAGGGCCGAGCCCGGCACCGAGGCTAGGC 
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VGEGLYQGVPRAEPGTEARR 

GCCATTACGACGAGGGCGTCAGGATGGGCTCTCTGGGCCTCTTCCTGCAGTGCGCCATCA 
HYDEGVRMGSLGLFLQCAIS 

GTCTGGTGTTCTCTCTGGTGATGGACCGGCTGGTGCAGCGCTTCGGCACCCGGGCCGTGT 
LVFSLVMDRLVQRFGTRAVY 

ACCTCGCCTCTGTGGCGGCTTTCCCCGTCGCCGCCGGCGCGACCTGCCTGTCTCATTCTG 
LASVAAFPVAAGATCLSHSV 

TCGCCGTGGTG AC CGCC AG CGC CGC CCTG ACCGG CTTC ACCTTC AGTG CGCTCCAG ATTC 
AVVTASAALTGFTFSALQIL 

TGCCCTACACCCTGGCGTCTCTGTACCATCGCGAGAAGCAGGTGTTCCTGCCCAAGTACC 
PYTLAS LYHREKQVFLPKYR 

GCGGGGACACAGGGGGAGCTTCCTCTGAGGACAGCCTGATGACCAGCTTCTTGCCCGGCC 
GDTGGASSEDSLMTSFLPGP 

CCAAGCCGGGGGCCCCTTTCCCCAACGGCCATGTCGGGGCGGGCGGCAGCGGCCTGCTCC 
KPGAPF PNGHVGAGGSGLLP 

CTCCCCCCCCCGCCCTGTGCGGCGCTAGTGCCTGCGACGTGAGCGTGCGGGTGGTGGTGG 
PPP-ALCGASACDVSVRVVVG 

GGGAGCCCACCGAGGCTAGGGTCGTGCCTGGCCGGGGGATCTGCCTGGACCTGGCCATCC 
EPTEARVVPGRGICLDLAIL 

TCGACTCCGCCTTCCTGCTCTCCCAGGTGGCGCCCAGCCTGTTCATGGGCAGTATCGTGC 
DSAFLLSQVAPSLFMGSIVQ 

AGCTGAGCCAGAGCGTGACCGCCTACATGGTGAGCGCCGCCGGCCTGGGGTTGGTGGCCA 
LS QSVTAYMVSAAGLGLVA I 

TCTACTTTGCCACCCAGGTCGTGTTCGACAAGAGCGATCTCGCCAAGTATAGCGCCTGAG 
YFATQVVFDKSDLAKYSA* 
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FIG. 21 - Western blot analysis of CHO cells following transient transfection with 
P501S (JNW680), CPC-P501S (JNW735) and empty vector control. 



1 



3 4 5 



62kDa 




Lane Sample 

1 CPC-P501S(JNW735) 

2 CPC P501S protein (62.5ng) 

3 P501S(JNW680) 

4 P501S (JNW680) 

5 Empty vector control 



Bony pjjr.-.-. , , , og DEC 2004 



10/517420 



WO 03/104272 PCT/EP03/06096 

38/45 

FIG. 22 - Anti-P501S antibody responses following immunisation at dayO, 21 & 42 
with pVAC-P501S (JNW680, mice B1-9) or Empty vector (pVAC, mice A1-6). 
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FIG. 23 - Peptide library screen using C57BL/6 mice immunised at day 0, 21, 42, and 
70 with pVAC-P501S (JNW680). 
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FIG. 24 - Cellular responses by ELISPOT at day 77 following PMID immunisation at 
day 0, 21, 42, and 70 with pVAC-P501S (JNW680, B6-9) and pVAC empty (A4-6). 

Graph A shows the IFN-y responses whilst Graph B shows the IL-2 responses. 
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FIG. 25 - Comparison of P501S and CPC-P501S. 




FIG. 26 - Immune response (lymphoproliferation on spleen cells) following protein 
Immunisation with CPC-P501S. 
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FIG. 27 - Evaluation of the immune response to different CPC-P501S constructs 
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FIG.28. MUC1-CPC DNA and polypeptide sequences 
FIG. 28A. DNA sequence (SEQ ID NO.49) 

ATGACACCGGGCACCCAGTCTCCTTTCTTCCTGCTGCTGCTCCTCACAGTGCTTACAGTTGTTACAGGTTCTG 

GTCATGCAAGCTCTACCCCAGGTGGAGAAAAGGAGACTTCGGCTACCCAGAGAAGTTCAGTGCCCAGCTCTAC 

TGAGAAGAATGCTGTG AGTATG ACCAGCAGCGTACTCTC CAGCC AC AG C C CCGGTTCAGGCTCCTCCACC ACT 

CAGGGAC^GGATGTCACTCTGGCCCCGGCCACGGAACCAGCTTCAGGTTCAGCTGCCACCTGGGGACAGGATG 

TCACCTCGGTCCCAGTCACCAGGCCAGCCCTGGGCTCCACCACCCCGCCAGCCCACGATGTCACCTCAGCCCC 

GGACAACAAGCCAGCCCCGGGCTCCACCGCCCCCCCAGCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCG 

CCCCCGGGCTCCACCGCCCCCCCAGCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGCCCCCGGGCTCCA 

CCGCGCCCGCAGCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAGC 

CCATGGTGTCACCTCGGCCCCGGACAACAGGCCCGCCTTGGCGTCCACCGCCCCTCCAGTCCACAATGTCACC 

TCGG C CTCAGGCTCTG CATC AGG CTCAGCTTCTACTCTGGTGCAC AACGG CAC CT C TGCCAGGGCTAC CACAA 

CCCCAGCCAGCAAGAGCACTCCATTCTCAATTCCCAGCCACCACTCTGATACTCCTACCACCCTTGCCAGCCA 

TAGCACCAAGACTGATGCCAGTAGCACTCACCATAGCACGGTACCTCCTCTCACCTCCTCCAATCACAGCACT 

TCTCCCCAGTTGTCTACTGGGGTCTCTTTCTTTTTCCTGTCTTTTCACATTTOVAACCTCCAGTTTAAT^ 

CTC TGG AAGATC CCAGC AC CG ACTACTACC AAGAGCTGC AGAGAG ACATTT CTG AAATGTTTTTGC AGATTT A 

TAAAC^VAGGGGGTTTTCTGGGCCTCTCCAATATTAAGTTCAGGCCAGGATCTGTGGTGGTACAATTGACTCTG 

GCCTTCCGAGAAGGTACCATCAATGTCCACGACGTGGAGACACAGTTCAATCAGTATAAAACGGAAGCAGCCT 

CTCGATATAACCTGACGATCTCAGACGTCAGCGTGAGTGATGTGCCATTTCCTTTCTCTGCCCAGTCTGGGGC 

TGGGGTGC C AGGCTGGGGC ATCG CGCTG CTGGTGCTGGTCTGTOTTCTGGTTGCG CTGGC C ATTGTCTATCTC 

ATTGCCTTGGCTGTCTGTCAGTGCCGCCGAAAGAACTACGGGCAGCTGGACATCTTTCCAGCCCGGGATACCT 

ACCATCCTATGAGCGAGTACCCCACCTACCACACCCATGGGCGCTATGTGCCCCCTAGCAGTACCGATCGTAG 

CCC CTATGAG AAGGTTTCTGCAGGTAATGGTGGCAGCAG CCTCTCTTAC AC AAAC CCAGC AGTGGCAGCCACT 

TCTGCCAACTTGATGGCGGC CG CTTACGTACATTCCG ACGGCTCTTATC C AAAAG AC AAGTTTG AG AAAATCA 

ATGGCACTTGGTACTACTTTGACAGTTCAGGCTATATGCTTGCAGACCGCTGGAGGAAGCACACAGACGGCAA 

CTGGTACTGGTTCGACAACTCAGGCGAAATGGCTACAGGCTGGAAGAAAATCGCTGATAAGTGGTACTATTTC 

AACGAAGAAGGTGCCATGAAGACAGGCTGGGTCAAGTACAAGGACACTTGGTACTACTTAGACGCTAAAGAAG 

GCGCCATGCAATACATCAAGGCTAACTCTAAGTTCATTGGTATCACTGAAGGCGTCATGGTATCAAATGCCTT 

TATCCAGTCAGCGGACGGAACAGGCTGGTACTACCTCAAACCAGACGGAACACTGGCAGACAGGCCAGAATGA 

FIG. 28 B. MUC1-CPC polypeptide sequence (SEQ ID NO.50) 

MTPGTQSPFFLLLLLTVLTWTGSGHASSTPGGEKETSATQRSSVPSSTEKNAVSMTSSVLSSHSPGSGSSTT 
QGQD VTLAP ATE P AS GS AATWGQD VTS V P VTR P ALGS TTP P AHD VTS APDNKP APG S TAP P AHG VTS APDTR P 
P PGSTAP P AHGVTS APDTRPP PGSTAPAAHG VTS APDTRPAPGSTAP PAHGVTS APDNRP ALAS TAP PVHNVT 
SASGSASGSASTLVHNGTSARATTTPASKSTPFSIPSHHSDTPTTLASHSTKTDASSTHHSTVPPLTSSNHST 
SPQLSTGVSFFFLSFHISNLQFNSSLEDPSTDYYQELQRDISEMFLQIYKQGGFLGLSNIKFRPGSVWQLTL 
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AFREGTINVHDVETQFNQYKTEAASRYNLTISDVSVSDVPFPFSAQSGAGVPGWGIALLVLVCVLVALAIVYL 
IALAVCQCRRKNYGQLDIFPARDTYHPMSEYPTYHTHGRYVPPSSTDRSPYEKVSAGNGGSSLSYTNPAVAAT 
SANLMAAAYVHSDGSYPKDKFEKINGTWYYFDSSGYMLADRWRKOT 

NEEGAMKTGWVKYKDTW Y YLDAKEG AMQ Y I KANS KF IG ITEGVMVSNAF I QS ADGTGWY YLKPDGTLADRPE 



FIG.29. ss-CPC-MUC1 construct and sequence 
5 FIG. 29A. DNA sequence (SEQ ID NO.51) 

ATGGG ATGG AG CTGTATC ATCCTCTTCTTGGTAGC AAC AGCTACAGGTGTC CACTCC CAGGTCCAAATGGCGG 
C CGCTTACGTACATTCCGACGGCTCTT ATC CAAAAG ACAAGTTTG AGAAAATC AATGGCACTTGGTACTACTT 
TGACAGTTCAGGCTATATGCTTGCAGACCGCTGGAGGAAGCACACAGACGGCAACTGGTACTGGTTCGACAAC 
TCAGGCGAAATGGCTACAGGCTGGAAGAAAATCGCTGATAAGTGGTACTATTTCAACGAAGAAGGTGCCATGA 
AGACAGGCTGGGTGAAGTACAAGGACACTTGGTACTACTTAGACGCTAAAGAAGGCGCCATGCAATACATCAA 
GGCTAACTCTAAGTTCATTGGTATCACTGAAGGCGTCATGGTATCAAATGCCTTTATCCAGTCAGCGGACGGA 
ACAGGCTGGTACT AC CTC AAAC CAGACGG AAC ACTGGCAG AC AGGCC AGAAATG ACACCGGGC ACCCAGTCTC 
CTTTCTTCCTGCTGCTGCTCCTCACAGTGCTTACAGTTGTTACAGGTTCTGGTCATGCAAGCTCTACCCCAGG 
TGGAGAAAAGGAGACTTCGGCTACC C AGAGAAGTTCAGTGCC CAGCTCTACTG AG AAGAATG CTGTG AGTATG 
ACCAGCAGCGTACTCTCCAGCCACAGCCCCGGTTCAGGCTCCTCCACCACTCAGGGACAGGATGTCACTCTGG 
CCCCGGCCACGGAACCAGCTTCAGGTTCAGCTGCCACCTGGGGACAGGATGTCACCTCGGTCCCAGTCACCAG 
GCCAGCCCTGGGCTCCACCACCCCGCCAGCCCACGATGTCACCTCAGCCCCGGACAACAAGCCAGCCCCGGGC 
TCCACCGCCCCCCCAGCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGCCCCCGGGCTCCACCGCCCCCC 
CAGCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGCCCCCGGGCTCCACCGCGCCCGCAGCCCACGGTGT 
CACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAGCCCATGGTGTCACCTCGGCCCCG 
GACAACAGGCCCGCCTTGGCGTCCACCGCCCCTCCAGTCCACAATGTCACCTCGGCCTCAGGCTCTGCATCAG 
GCTC AGCTTCTACTCTGGTG CACAACGGC ACCTCTGCC AGGG CTAC CAC AACCCC AG CC AGCAAGAG CACTCC 
ATT CTC AATTCCC AG CC ACC ACTCTG ATACTCCTACC ACC CTTGCC AGCCATAG CACCAAGACTG ATGC CAGT 
AGCACTCACCATAGCACGGTACCTCCTCTCACCTCCTCCAATCACAGCACTTCTCCCCAGTTGTCTACTGGGG 
TCTCTTTCTTTTTCCTGTCTTTTCACATTTCAAACCTCCAGTTTAATTCCTCTCTGGAAGATCCCAGCACCGA 
CTACTACCAAGAGCTGCAGAGAGACATTTCTGAAATGTTTTTGCAGATTTATAAACAAGGGGGTTTTCTGGGC 
CTCTCCAATATTAAGTTCAGGCCAGGATCTGTGGTGGTACAATTGACTCTGGCCTTCCGAGAAGGTACCATCA 
ATG TC CAC G ACGTGGAG AC AC AGTTC AATC AGTAT AAAACGGAAGCAGCC TCTCG AT ATAACCTG ACG ATCTC 
AGACGTCAGCGTGAGTGATGTGCCATTTCCTTTCTCTGCCCAGTCTGGGGCTGGGGTGCCAGGCTGGGGCATC 
GCGC TG CTGGTGCTGG TCTGTGTTCTGGTTG CG CTGGC CATTGTCTATCTC ATTG CCTTGGCTGTCTGTC AGT 
GCCGCCGAAAGAACTACGGGCAGCTGGACATCTTTCCAGCCCGGGATACCTACCATCCTATGAGCGAGTACCC 
CACCTACCACACCCATGGGCGCTATGTGCCCCCTAGCAGTACCGATCGTAGCCCCTATGAGAAGGTTTCTGCA 
GGTAATGGTGGCAGCAGCCTCTCTTACACAAACCCAGCAGTGGCAGCCACTTCTGCCAACTTGTAG 
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FIG. 29B. ss-CPC-MUC1 protein sequence Polypeptide sequence (SEQ ID NO.52) 

MGWSCIILFLVATATGVHSQVQMAAAYVHSDGSYPKDKFEKINGTWYYFDSSGYM 

SGEMATG WKKI AD KWY YFNE EGAMKTG WVKY KDTWYY LD AKEG AMQ Y I KAN S KF I G I TEG VMVSNAF I Q S ADG 

TGWYYXKPDGTLADRPEMTPGTQSPFFLLLLLTVLTVVTGSGHASSTPGGEKETSATQRSSVPSSTEKNAVSM 

TSSVLSSHSPGSGSSTTQGQDVTLAPATEPASGSAATWGQDVTSVPVTRPALGSTTPPAHDVTSAPDNKPAPG 

S TAP P AHG VT S APDTR P P PG STAP PAHG VT S APDTR P P PG S TAPAAHG VTS APDTR P APG S T AP P AHG VTS AP 

DNRPALASTAPPVHNVTSASGSASGSASTLVHNGTSARATTTPASKSTPFSIPSHHSDTPTTLASHSTKTO 

STHHSTVPPLTSSNHSTSPQLSTGVSFFFLSFHISNLQFNSSLEDPSTDYYQELQRDISEMFLQIYKQGGFLG 

LSNIKFRPGSVWQLTLAFREGTIKVmVETQFNQYKTEAASRYNLTISDVSVSDVPFPFSAQSGAGVPGWGI 

ALLVLVC^VALAIVYLIALAVCQCRRKNYGQLDIFPARDTYHPMSEYPTYHTHGRYVPPSSTDRSPYEKVSA 

GNGGSSLSYTNPAVAATSANL 



